Bilal Hejase
Bilal Hejase

Reputation: 23

Can different policy iteration methods converge to different optimal policies?

For example, I have tried to run lambda iteration iteration on a random MDP. I noticed getting different policies depending on the value of lambda. Can TD(1) and TD(0) give different optimal policies?

Update: Increasing my initial value function gave me the same result for both cases.

Upvotes: 0

Views: 182

Answers (1)

Pablo EM
Pablo EM

Reputation: 6679

Yes, in general, RL methods with convergence guarantees can converge to any optimal policy. So, if an MDP has several optimal policies, algorithms (including Policy iteration methods) could converge to any of the optimal policies.

Upvotes: 1

Related Questions