Reputation: 121
I'm trying to train an AI to play snake with a genetic algorithm. I'm using the Python library NEAT for the training. The problem is that the training doesn't converge and the AI doesn't learn. Here's the training code:
class SnakeEnv():
def __init__(self, screen):
self.action_space = np.array([0, 1, 2, 3])
self.state = None
pygame.init()
self.screen = screen
self.snakes = []
self.total_reward = 0
def reset(self):
self.__init__()
def get_state(self):
return np.reshape(self.snake.board, (400, 1)).T / 5
def render(self, snake):
self.screen.fill((0, 0, 0))
snake.food.render()
snake.render()
pygame.display.flip()
def step(self, snake, action):
snake.move(action)
self.render(snake)
def close(self):
pygame.quit()
def eval_genomes(self, genomes, config):
global nets_g
nets_g = []
nets = []
snakes = []
global ge_g
ge_g = []
ge = []
for genome_id, genome in genomes:
genome.fitness = 0
net = neat.nn.FeedForwardNetwork.create(genome, config)
nets.append(net)
snakes.append(Snake(self.screen))
ge.append(genome)
ge_g = ge.copy()
nets_g = nets.copy()
run = True
#Main loop
while run:
for event in pygame.event.get():
if event.type == pygame.QUIT:
run = False
pygame.quit()
quit()
break
for x, snake in enumerate(snakes):
if(snake.done):
continue
ge[x].fitness += 0.1
"""
Inputs to the neural net:
Vertical distance from food to head
Horizontal distance from food to head
Vertical distance to nearest wall from head
Horizontal distance to nearest wall from head
Distance from head to body segment (default -1)
"""
snake_x = snake.head.x
snake_y = snake.head.y
food_x = snake.food.x
food_y = snake.food.y
food_vert = snake_y - food_y
food_horz = snake_x - food_x
wall_vert = min(snake_y, 600 - snake_y)
wall_horz = min(snake_x, 600 - snake_x)
body_front = snake.body_front()
output = np.argmax(nets[snakes.index(snake)].activate((food_vert, food_horz, wall_vert, wall_horz, body_front)))
state = snake.move(output)
if state["Food"] == True:
ge[snakes.index(snake)].fitness += 1
if state["Died"] == True:
ge[snakes.index(snake)].fitness -= 1
#nets.pop(snakes.index(snake))
#ge.pop(snakes.index(snake))
#snakes.pop(snakes.index(snake))
all_done = [snake.done for snake in snakes]
if(False not in all_done):
run = False
def run(self, config_file):
config = neat.config.Config(neat.DefaultGenome, neat.DefaultReproduction, neat.DefaultSpeciesSet, neat.DefaultStagnation, config_file)
population = neat.Population(config)
population.add_reporter(neat.StdOutReporter(True))
stats = neat.StatisticsReporter()
population.add_reporter(stats)
best = population.run(self.eval_genomes, 200)
print('\nBest genome:\n{!s}'.format(best))
best_net = nets_g[ge_g.index(best)]
pickle.dump(best_net, open('best.pkl', 'wb'))
(Pretend like my code is indented, the editor isn't working for some reason)
Here's the conf.txt
file:
[NEAT]
fitness_criterion = max
fitness_threshold = 20
pop_size = 50
reset_on_extinction = False
[DefaultGenome]
# node activation options
activation_default = relu
activation_mutate_rate = 0.0
activation_options = relu
# node aggregation options
aggregation_default = sum
aggregation_mutate_rate = 0.0
aggregation_options = sum
# node bias options
bias_init_mean = 0.0
bias_init_stdev = 1.0
bias_max_value = 10.0
bias_min_value = -10.0
bias_mutate_power = 0.5
bias_mutate_rate = 0.9
bias_replace_rate = 0.1
# genome compatibility options
compatibility_disjoint_coefficient = 1.0
compatibility_weight_coefficient = 0.5
# connection add/remove rates
conn_add_prob = 0.7
conn_delete_prob = 0.7
# connection enable options
enabled_default = True
enabled_mutate_rate = 0.01
feed_forward = True
initial_connection = full
# node add/remove rates
node_add_prob = 0.7
node_delete_prob = 0.7
# network parameters
num_hidden = 0
num_inputs = 5
num_outputs = 4
# node response options
response_init_mean = 1.0
response_init_stdev = 0.0
response_max_value = 30.0
response_min_value = -30.0
response_mutate_power = 0.0
response_mutate_rate = 0.0
response_replace_rate = 0.0
# connection weight options
weight_init_mean = 0.0
weight_init_stdev = 1.0
weight_max_value = 30
weight_min_value = -30
weight_mutate_power = 0.5
weight_mutate_rate = 0.8
weight_replace_rate = 0.1
[DefaultSpeciesSet]
compatibility_threshold = 3.0
[DefaultStagnation]
species_fitness_func = max
max_stagnation = 20
species_elitism = 2
[DefaultReproduction]
elitism = 2
survival_threshold = 0.2
As you can see I train for 200 generations. The results are pretty odd. The snake consistently gets a single piece of food but then immediately runs into a wall. It's sort of learning but not fully. I've tried to let it train for more generations, but there's no difference. I think the problem may be with my inputs to the neural nets, but I'm not sure.
EDIT: I changed the network architecture so that it now has 4 output nodes with a relu
activation. The problem is now that the code freezes on the step where the output is computed (output = np.argmax(nets[snakes.index(snake)].activate((food_vert, food_horz, wall_vert, wall_horz, body_front)))
)
Upvotes: 0
Views: 640
Reputation: 5467
From glancing over your code, you seem to have some bug(s):
for x, snake in enumerate(snakes):
ge[x].fitness += 0.1
Within the for
loop you are pop()
ing elements from both the snakes
and the ge
lists. In Python you should never change a list while iterating over it. Later in the loop you are using snakes.index(snake)
insted of x
to index the same list. Because of this, the reward for staying alive will probably go to the wrong snakes.
You could copy the list before iterating, but repeating snakes.index(snake)
everywhere also is an anti-pattern. You need to find a different solution. For example you could use a snake.dead
flag.
You seem to be scaling the output of a single neuron into integer range. This makes the task a bit difficult (but not impossible) to solve for the NN, because close-together numbers don't actually map to similar actions.
The more common approach would be to use a separate neuron for each output, and select the action with highest activation. (Or to use softmax to select an action with random probabilities. This adds noise but makes the fitness landscape much smoother, because then even small changes to the weights will have some effect on fitness.)
You cannot expect to write bug-free code. When your code is part of an optimization loop debugging is extra-tricky because the optimization changes the effect of bugs.
Run your code in a simpler setting first. For example, you could ignore the output of the neural net and always do the same action (or random actions) instead. Think about what is supposed to happen. Maybe track some snakes and their reward manually step-by-step, e.g. with print statements.
The point is: reduce the number of things you are debugging at the same time.
Upvotes: 2