user24851465
user24851465

Reputation: 11

How to Implement 'game rules' when training a Deep Q Network

I am trying to make a Deep-Q-network that teaches itself to play modified versions of tictactoe (a m,n,k-game)

I want to make sure the network does not place a mark where there already is a mark

I currently have two ideas for it:

  1. The agent can select the next action only from empty grids
  2. Let the agent choose any grid and give a penalty if it chooses a already-occupied grid(and end the episode)

I'm pretty sure both would work, but which one will be more efficient while training?

Trying option 1, but i'm not sure the Q-values for 'ilegal grids' are getting smaller, and each episode seems to take too long

Upvotes: 1

Views: 43

Answers (0)

Related Questions