Neural Network Idea for Prisoner’s Dilemma in ml5.js

In the game Prisoner’s Dilemma, each round, both players secretly choose either “collaborate” (work together) or “oppose” (betray the other player).
After they both decide, they reveal their choices:

If both collaborate then both get 1 point.
If one collaborates and the other opposes then opposing player gets 2 points, and the collaborating player gets 0 points.
If both oppose then neither gets any points.

This happens over multiple rounds, and at the end, the player with the most points wins.

I want to implement a neural network in ml5.js that is able to play against the same neural network or a human player.

What is the input?

The model will have 3 inputs.

2d array of previous rounds. e.g. [[1,0],[1,1],[0,0],[0,1],[-1,-1],[-1,-1]] (where -1 or 2 represents empty rounds)
The score of the current player
The score of the opposing player
The round number

What is the output?

The model will have 1 boolean output. If true then oppose, false than collaborate.

What kind of learning task is it?

This will be a classification problem since the model is classifying the input based on a group of guidelines. But technically it needs reinforcement learning which can be hard for ml5.js.

What challenges do you anticipate?

There will be a few challenges.

I need enough dataset to train this model… or I can just spend the entire day to play with it? Or I can ask a DeepSeek or a local LLM to play with it instead?
ml5.js may struggle with learning complex strategies
Overfit (please don’t)
Real players may change tactics unpredictably, making it harder for a static model to perform well