Pytorch: merging two models (nn.Module)

Question

I have a model that's quite complicated and I therefore can't just call self.fc.weight etc. so I want to iterate over the model in some way.

The goal is to merge models this way: m = alpha * n + (1 - alpha) * o where m n and o are instances of the same class but trained differently. So for each parameter in these models, I want to assign initial values to m based on n and o as described in the equation and then continue the training procedure with m only.

I tried:

for p1, p2, p3 in zip(m.parameters(), n.parameters(), o.parameters()):
    p1 = alpha * p2 + (1 - alpha) * p3

But this does not assign new values within m.

for p1, p2, p3 in zip(m.parameters(), n.parameters(), o.parameters()):
    p1.fill_(alpha * p2 + (1 - alpha) * p3)

But this throws

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

And so I resorted to a working

m.load_state_dict({
    k: alpha * v1 + (1 - alpha) * v2 
    for (k, v1), (_, v2) in zip(n.state_dict().items(), o.state_dict().items())
})

Is there a better way to do this in Pytorch? Is it possible that I get gradient errors?

Jan · Accepted Answer

If I understand you correctly, then you need to get out from under PyTorch's autograd mechanics, which you can by simply doing

p1.data = alpha * p2.data+ (1 - alpha) * p3.data

The parameter's data is not in the parameter itself, but in the data member.

Pytorch: merging two models (nn.Module)

Answers (1)

Related Questions