Reputation: 33
Optimal value of state-action by bellman optimal equation(63 page of sutton 2018) is
and Q-learning is
I have known that Q-learning is model-free. so It doesn't need a probability of transition for next state.
However, p(s'r|s,a) of bellman equation is probability of transition for next state s' with reward r when s, a are given. so I think to get a Q(s,a), it needs probability of transition.
Q of bellman equation and Q of q-learning is different?
If it is same, how q-learning can work as model-free?
Is there any way to get a Q(s,a) regardless of probability of transition for q-learning?
Or Am i confusing something?
Upvotes: 3
Views: 911
Reputation: 15488
Q-learning is an instance of the Bellman equation applied to a state-action value function. It is "model-free" in the sense that you don't need a transition function that determines, for a given decision, which state is next.
However, there are several formulations of Q-Learning which differ in the information that is known. In particular, when you know the transition function, you can and should use it in your Bellman equation. This results in the equation you cited.
On the other hand, if you don't know the transition function, Q-learning works as well, but you have to sample the impact of the transition function through simulations.
Upvotes: 2