Reputation: 23

Can different policy iteration methods converge to different optimal policies?

For example, I have tried to run lambda iteration iteration on a random MDP. I noticed getting different policies depending on the value of lambda. Can TD(1) and TD(0) give different optimal policies?

Update: Increasing my initial value function gave me the same result for both cases.

Upvotes: 0

Answers (1)

Pablo EM

Reputation: 6679

Yes, in general, RL methods with convergence guarantees can converge to any optimal policy. So, if an MDP has several optimal policies, algorithms (including Policy iteration methods) could converge to any of the optimal policies.

Upvotes: 1

Can different policy iteration methods converge to different optimal policies?

Answers (1)

Related Questions