Amir_P
Amir_P

Reputation: 9019

Deep Q Learning For Snake Game

I'm working on a project base on Keras Plays Catch code. I have changed the game to a simple Snake game and I represent the snake a dot on the board for the sake of simplicity. If Snake ate the reward it will get +5 score and For hitting wall it will get -5 and for every move -0.1. But It's not learning the strategy and gives terrible results. here is my Games play function

def play(self, action):
    if action == 0:
        self.snake = (self.snake[0] - 1, self.snake[1])
    elif action == 1:
        self.snake = (self.snake[0], self.snake[1] + 1)
    elif action == 2:
        self.snake = (self.snake[0] + 1, self.snake[1])
    else:
        self.snake = (self.snake[0], self.snake[1] - 1)

    score = 0
    if self.snake == self.reward:
        score = 5
        self.setReward()
    elif self.isGameOver():
        score = -5
    else:
        score = -0.1

    return self.getBoard(), score, self.isGameOver()

which returns something like this (1 is the snake and 3 is the reward and 2 represents the wall):

 [[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 1. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 3. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 0. 0. 0. 0. 0. 0. 0. 0. 2.]
 [2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]

and here is my code for q learning on gist.

I don't know what I'm doing wrong but Most of the games it plays, it gets stucked in a loop (up and down or right and left) or it gets right to the wall and there is a small chance of eating the reward before it hits the wall. How can I improve it and make it work?

Upvotes: 4

Views: 947

Answers (1)

knh190
knh190

Reputation: 2882

If your snake never hits the reward it may never learn the +5 score. Instead of using constant 0.1 penalty per move, use a distance based cost for each tile will probably help. In another word, the agent in your game is not aware of the existence of a reward.

I think eventually you'll end up with something like A* path finding. At least the heuristics are similar.


Update:

Considering the complete code you've posted, your loss function and the score doesn't match! When score is high your model's loss is random.

Try maximizing game score as your goal.

Upvotes: 1

Related Questions