Machine learning: specific strategy learned because of playing against specific agent?

Question

First of all I found difficulties formulating my question, feedback is welcome.

I have to make a machine learning agent to play dots and boxes.

I'm just in the early stages but came up with the question: if I let my machine learning agent (with a specific implementation) play against a copy of itself to learn and improve it's gameplay, wouldn't it just make a strategy against that specific kind of gameplay?

Would it be more interesting if I let my agent play and learn against different forms of other agents in an arbitrary fashion?

Dennis Soemers · Accepted Answer

The idea of having an agent learn by playing against a copy of itself is referred to as self-play. Yes, in self-play, you can sometimes see that agents will "overfit" against their "training partner", resulting in an unstable learning process. See this blogpost by OpenAI (in particular, the "Multiplayer" section), where exactly this issue is described.

The easiest way to address this that I've seen appearing in research so far is indeed to generate a more diverse set of training partners. This can, for example, be done by storing checkpoints of multiple past versions of your agent in memory / in files, and randomly picking one of them as training partner at the start of every episode. This is roughly what was done during the self-training process of the original AlphaGo Go program by DeepMind (the 2016 version), and is also described in another blogpost by OpenAI.

Machine learning: specific strategy learned because of playing against specific agent?

Answers (1)

Related Questions