Reputation: 45
As far as I understand Q-learning, a Q-value is a measure of "how good" a particular state-action pair is. This is usually represented in a table in one of the following ways (see fig.):
Upvotes: 0
Views: 1129
Reputation: 14031
No. In general, an action is not equivalent to a transition to a particular state. There can be a different number of actions than states, the same action could lead to different states depending on which state it is performed in, and different actions could lead to the same state. Transitions can also be stochastic.
See (1).
Upvotes: 2