Reputation: 427
I would like to cross check my understanding on reinforcement learning. How easy/difficult or common to train a policy and then reuse the learned policy later on? What I understood so far is that when we stop the training and if we would again start, it would need start from scratch i.e. not able to benefit from the learned policy. Thank you.
Upvotes: 2
Views: 135
Reputation: 2312
It depends what specific method you are using but generally, once a learning method converges, there is no need to “train”. In the case of Q-learning, for example, which is a model-free off-policy approach to learning, before the algorithm converges the agent must still take random actions to ensure every relevant point in the Q(s,a) space has been explored. But each individual step takes advantage of the experience gained from prior episodes, so to say that you start from scratch each episode would be incorrect.
Upvotes: 2