Skyfe
Skyfe

Reputation: 581

MDP: How to calculate the chances of each possible result for a sequence of actions?

I've got a MDP problem with the following environment (3x4 map):

enter image description here

with the possible actions Up/Down/Right/Left and a 0.8 chance of moving in the right direction, 0.1 for each adjoining direction (e.g. for Up: 0.1 chance to go Left, 0.1 chance to go Right).

Now what I need to do is calculate the possible results starting in (1,1) running the following sequence of actions:

[Up, Up, Right, Right, Right]

And also calculate the chance of reaching a field (for each possible outcome) with this actions sequence. How can I do this efficiently (so not going through the at least 2^5, max 3^5 possible results)?

Thanks in advance!

Upvotes: -1

Views: 358

Answers (1)

Max pridy
Max pridy

Reputation: 11

Well. I wonder if you are solving the RL problem. We now usually solve the RL problem with Bellman equation and Q-learning.

You will also benefit from this lecture. http://cs229.stanford.edu/notes/cs229-notes12.pdf

And if you have finished learning, repeat the whole process and you will know [up, up, right, right, right]'s probability.

and after learning, the second constraint will be meaningless because it reaches the correct answer almost immediately.

I think this example is in AIMA, right? Actually I have a few questions about the approach. I think it doesn't seem to right my answer if you approach it very theoretically.

while not done:
    if np.random.rand(1) < e:
        action = env.action_space.sample()
    else:
        action = rargmax(Q[state, :])

    new_state, reward, done, _ = env.step(action)
    Q[state, action] = Q[state, action]+ lr * (reward + r*np.max(Q[new_state,:]) - Q[state, action])

and this is the code I simply code with the gym.

Upvotes: 0

Related Questions