What is the difference between the `policy` and `collect_policy` of a tf-agent?

Question

I am looking at tf-agents to learn about reinforcement learning. I am following this tutorial. There is a different policy used, called collect_policy for training than for evaluation (policy).

The tutorial states there is a difference, but in IMO it does not describe the why of having 2 policies as it does not describe a functional difference.

Agents contain two policies:

agent.policy — The main policy that is used for evaluation and deployment.

agent.collect_policy — A second policy that is used for data collection.

I've looked at the source code of the agent. It says

policy: An instance of tf_policy.Base representing the Agent's current policy.

collect_policy: An instance of tf_policy.Base representing the Agent's current data collection policy (used to set self.step_spec).

But I do not see self.step_spec anywhere in the source file. The next closest thing I find is time_step_spec. But that is the first ctor argument of the TFAgent class, so that makes no sense to set via a collect_policy.

So the only thing I can think of was: put it to the test. So I used policy instead of collect_policy for training. And the agent reached the max score in the environment nonetheless.

So what is the functional difference between the two policies?

What is the difference between the `policy` and `collect_policy` of a tf-agent?

Answers (1)

Related Questions