Reputation: 333
In my understanding, reinforcement learning will get a reward from the action.
However, when playing a video game, there is no reward ( reward == 0 ) in most of the steps (ex: street fighter), eventually, we got a reward ( ex: player win, reward = 1 ), there are so many actions, how machine know which one is the key point to win this game ?
Upvotes: 2
Views: 144
Reputation: 6276
In Reinforcement Learning the reward can be immediate or delayed [1]:
Reinforcement learning algorithms, such as Q-learning, choose the action that gives the highest expected reward. This reward is continuously updated with the current reward (r at time t) and with possible future rewards (the last value in the equation, max Q, based on actions from time t+1 and later):
More detailed information about (Deep) Reinforcement Learning, with some examples of applications to games, is given in A Beginner's Guide to Deep Reinforcement Learning.
Upvotes: 2