Value iteration not converging - Markov decision process

Question

I am having an issue with the results I am getting from performing value iteration, with the numbers increasing to infinity so I assume I have a problem somewhere in my logic.

Initially I have a 10x10 grid, some tiles with a reward of +10, some with a reward of -100, and some with a reward of 0. There are no terminal states. The agent can perform 4 non-deterministic actions: move up, down, left, and right. It has an 80% chance of moving in the chosen direction, and a 20% chance of moving perpendicularly.

My process is to loop over the following:

For every tile, calculate the value of the best action from that tile

For example to calculate the value of going north from a given tile:

self.northVal = 0
self.northVal += (0.1 * grid[x-1][y])
self.northVal += (0.1 * grid[x+1][y])
self.northVal += (0.8 * grid[x][y+1])

For every tile, update its value to be: the initial reward + ( 0.5 * the value of the best move for that tile )
Check to see if the updated grid has the changed since the last loop, and if not, stop the loop as the numbers have converged.

I would appreciate any guidance!

Value iteration not converging - Markov decision process

Answers (1)

Related Questions