Reputation: 44
While I use torch.optim.Adam and exponential decay_lr in my PPO algorithm:
self.optimizer = torch.optim.Adam([
{'params': self.policy.actor.parameters(), 'lr': lr_actor},
{'params': self.policy.critic.parameters(), 'lr': lr_critic}
])
self.scheduler = torch.optim.lr_scheduler.ExponentialLR(self.optimizer, self.GAMMA)
The initial lr=0.1, and GAMMA=0.9.
Then I print the lr in my epoch dynamiclly with:
if time_step % update_timestep == 0:
ppo_agent.update()
print(f'__________start update_______________')
print(ppo_agent.optimizer.state_dict()['param_groups'][0]['lr'])
But, something wrong with this, and the bug is:
File "D:\Anaconda\lib\site-packages\torch\distributions\beta.py", line 36, in __init__
self._dirichlet = Dirichlet(concentration1_concentration0, validate_args=validate_args)
File "D:\Anaconda\lib\site-packages\torch\distributions\dirichlet.py", line 52, in __init__
super(Dirichlet, self).__init__(batch_shape, event_shape, validate_args=validate_args)
File "D:\Anaconda\lib\site-packages\torch\distributions\distribution.py", line 53, in __init__
raise ValueError("The parameter {} has invalid values".format(param))
ValueError: The parameter concentration has invalid values
Then, if I delete the "print()" sentence, it does work well! So, it's bothering me very much.
Upvotes: 0
Views: 1627
Reputation: 3496
You can get the learning rate like this:
self.optimizer.param_groups[0]["lr"]
Upvotes: 1