Reputation: 14764
I have a model that's quite complicated and I therefore can't just call self.fc.weight
etc. so I want to iterate over the model in some way.
The goal is to merge models this way: m = alpha * n + (1 - alpha) * o
where m
n
and o
are instances of the same class but trained differently. So for each parameter in these models, I want to assign initial values to m
based on n
and o
as described in the equation and then continue the training procedure with m
only.
I tried:
for p1, p2, p3 in zip(m.parameters(), n.parameters(), o.parameters()):
p1 = alpha * p2 + (1 - alpha) * p3
But this does not assign new values within m
.
for p1, p2, p3 in zip(m.parameters(), n.parameters(), o.parameters()):
p1.fill_(alpha * p2 + (1 - alpha) * p3)
But this throws
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
And so I resorted to a working
m.load_state_dict({
k: alpha * v1 + (1 - alpha) * v2
for (k, v1), (_, v2) in zip(n.state_dict().items(), o.state_dict().items())
})
Is there a better way to do this in Pytorch? Is it possible that I get gradient errors?
Upvotes: 1
Views: 1950
Reputation: 1240
If I understand you correctly, then you need to get out from under PyTorch's autograd mechanics, which you can by simply doing
p1.data = alpha * p2.data+ (1 - alpha) * p3.data
The parameter's data is not in the parameter itself, but in the data
member.
Upvotes: 3