Reputation: 581
I've got a MDP problem with the following environment (3x4 map):
with the possible actions Up/Down/Right/Left and a 0.8 chance of moving in the right direction, 0.1 for each adjoining direction (e.g. for Up: 0.1 chance to go Left, 0.1 chance to go Right).
Now what I need to do is calculate the possible results starting in (1,1) running the following sequence of actions:
[Up, Up, Right, Right, Right]
And also calculate the chance of reaching a field (for each possible outcome) with this actions sequence. How can I do this efficiently (so not going through the at least 2^5, max 3^5 possible results)?
Thanks in advance!
Upvotes: -1
Views: 358
Reputation: 11
Well. I wonder if you are solving the RL problem. We now usually solve the RL problem with Bellman equation and Q-learning.
You will also benefit from this lecture. http://cs229.stanford.edu/notes/cs229-notes12.pdf
And if you have finished learning, repeat the whole process and you will know [up, up, right, right, right]'s probability.
and after learning, the second constraint will be meaningless because it reaches the correct answer almost immediately.
I think this example is in AIMA, right? Actually I have a few questions about the approach. I think it doesn't seem to right my answer if you approach it very theoretically.
while not done:
if np.random.rand(1) < e:
action = env.action_space.sample()
else:
action = rargmax(Q[state, :])
new_state, reward, done, _ = env.step(action)
Q[state, action] = Q[state, action]+ lr * (reward + r*np.max(Q[new_state,:]) - Q[state, action])
and this is the code I simply code with the gym.
Upvotes: 0