pytorch: How does loss behave when coming from two networks?

Question

I am trying to implement the following algorithm in this book, section 13.5, in pytorch.

This would require two separate neural networks, (in this question, model1 and model2). One's loss is dependent only on its own output [via delta] (parameterized by w), the other (parameterized by theta), dependent both on its own output [via ln(pi)], and on the other's output [again, via delta].

I want to update each one separately

Assume the following models implement nn.Module:

model1 = Mynet1()
model2 = Mynet2()

val1 = model1(input1)
val2 = model2(input2)

self.optimizer1 = optim.Adam(model1.parameters(), lr1)
self.optimizer2 = optim.Adam(model2.parameters(), lr2)

loss1 = f(val1)
loss2 = f(val1, val2)#THIS IS THE INTERESTING PART

optim1.zero_grad()
loss1.backward
optim1.step()

optim2.zero_grad()
loss2.backward
optim2.step()

I understand that applying backward on loss1, then stepping its optimizer would update model1's parameters.

My question is what happens when activating the same on loss2, model2, optimizer2, where loss 2 is dependant on outputs both from model1 and model2?

How can I make the loss2 update not affect model1 parameters?

Umang Gupta · Accepted Answer

since optim2 has only model2's parameter it will only update model2 if you do optim2.step() as is being done.

However, loss2.backward() will compute gradients for both model1 and model2's params and if you do optim1.step() after that it will update model1's params. If you don't want to compute gradients for model1's param, you can do val1.detach() to detach it from the computational graph.

pytorch: How does loss behave when coming from two networks?

Answers (1)

Related Questions