Reputation: 85
Batch size mean the number of samples trained in neural work in supervised learning, however, what is the meaning of batch size meaning in the background of reinforcement learning? Does it refer to samples too? If so, what is the samples meaning in the background of reinforcement learning?
Upvotes: 4
Views: 7513
Reputation: 2975
Batch size does indeed mean the same thing in reinforcement learning, compared to supervised learning. The intuition of "batch learning" (usually in mini-batch) is two-fold:
In supervised learning, such as neural networks, you would do mini-batch gradient descent to update your neural network. In deep reinforcement learning, you're training the same neural networks, so it works in the same way.
In supervised learning, your batch would consist of a set of features, and its respective labels. In deep reinforcement learning, it is similar. It is a tuple (state, action, reward, state at t + 1, sometimes done).
State: The original state that describes your environment
Action: The action you performed in that environmental state
Reward: Reward signal obtained after performing that action in that state
State t+1: The new state your action transitioned you to.
Done: A boolean referring to the end of your task. For example, if you train RL to play chess, done would be either winning or losing the chess game.
You would sample a batch of these (s, a, r, s(t+1), done) tuples. Then you feed it into the TD update rule, usually in the form of:
The two Q's are the action values, and are calculated by passing s, s(t+1) and a into your neural network.
Then, you would update your neural network with the Q as the label.
Upvotes: 6