Reputation: 83
I have 2 nets sharing one optimizer using different learning rate. Simple code shown as below:
optim = torch.optim.Adam([
{'params': A.parameters(), 'lr': args.A},
{'params': B.parameters(), 'lr': args.B}])
Is this right? I ask this because when I check parameters in optimizer (using code below), I found only 2 parameters.
for p in optim.param_groups:
outputs = ''
for k, v in p.items():
if k is 'params':
outputs += (k + ': ' + str(v[0].shape).ljust(30) + ' ')
else:
outputs += (k + ': ' + str(v).ljust(10) + ' ')
print(outputs)
Only 2 parameters are printed:
params: torch.Size([16, 1, 80]) lr: 1e-05 betas: (0.9, 0.999) eps: 1e-08 weight_decay: 0 amsgrad: False
params: torch.Size([30, 10]) lr: 1e-05 betas: (0.9, 0.999) eps: 1e-08 weight_decay: 0 amsgrad: False
Actually, 2 nets have more than 100 parameters. I thought all parameters will be printed. Why is this happening? Thank you!
Upvotes: 0
Views: 1724
Reputation: 114786
You only print the first tensor of each param groups:
if k is 'params':
outputs += (k + ': ' + str(v[0].shape).ljust(30) + ' ') # only v[0] is printed!
Try and print all the parameters:
if k is 'params':
outputs += (k + ': ')
for vp in v:
outputs += (str(vp.shape).ljust(30) + ' ')
Upvotes: 1