Reputation: 453
I am having an issue with the results I am getting from performing value iteration, with the numbers increasing to infinity so I assume I have a problem somewhere in my logic.
Initially I have a 10x10 grid, some tiles with a reward of +10, some with a reward of -100, and some with a reward of 0. There are no terminal states. The agent can perform 4 non-deterministic actions: move up, down, left, and right. It has an 80% chance of moving in the chosen direction, and a 20% chance of moving perpendicularly.
My process is to loop over the following:
For example to calculate the value of going north from a given tile:
self.northVal = 0
self.northVal += (0.1 * grid[x-1][y])
self.northVal += (0.1 * grid[x+1][y])
self.northVal += (0.8 * grid[x][y+1])
I would appreciate any guidance!
Upvotes: 0
Views: 1359
Reputation: 63
What you're trying to do here is not Value Iteration: value iteration works with a state value function, where you store a value for each state. This means, in value iteration, you don't keep an estimate of each (state,action) pair.
Please refer the 2nd edition of Sutton and Barto book (Section 4.4) for explanation, but here's the algorithm for quick reference. Note the initialization step: you only need a vector storing the value for each state.
Upvotes: 0