Reputation: 27946
I am trying to implement the following algorithm in this book, section 13.5, in pytorch.
This would require two separate neural networks, (in this question, model1
and model2
). One's loss is dependent only on its own output [via delta] (parameterized by w), the other (parameterized by theta), dependent both on its own output [via ln(pi)], and on the other's output [again, via delta].
I want to update each one separately
Assume the following models implement nn.Module
:
model1 = Mynet1()
model2 = Mynet2()
val1 = model1(input1)
val2 = model2(input2)
self.optimizer1 = optim.Adam(model1.parameters(), lr1)
self.optimizer2 = optim.Adam(model2.parameters(), lr2)
loss1 = f(val1)
loss2 = f(val1, val2)#THIS IS THE INTERESTING PART
optim1.zero_grad()
loss1.backward
optim1.step()
optim2.zero_grad()
loss2.backward
optim2.step()
I understand that applying backward on loss1, then stepping its optimizer would update model1
's parameters.
My question is what happens when activating the same on loss2
, model2
, optimizer2
, where loss 2 is dependant on outputs both from model1
and model2
?
How can I make the loss2
update not affect model1
parameters?
Upvotes: 1
Views: 1056
Reputation: 16440
since optim2
has only model2's parameter it will only update model2
if you do optim2.step()
as is being done.
However, loss2.backward()
will compute gradients for both model1 and model2's params and if you do optim1.step()
after that it will update model1's params. If you don't want to compute gradients for model1's param, you can do val1.detach()
to detach it from the computational graph.
Upvotes: 3