In Pytorch, what is the most efficient way to copy the learned params of a model as the initialization for a second model of the same architecture?

Question

I have a CNN Model which has the following architecture:

class Model(nn.Module): 

    def __init__(self):
        super().__init__()

        self.conv1 = nn.Conv2d(4, 32, (8, 8), 4) 
        self.conv2 = nn.Conv2d(32, 64, (4, 4), 2)
        self.conv3 = nn.Conv2d(64, 64, (3, 3), 1)
        self.dense = nn.Linear(4*4*64, 512)
        self.out = nn.Linear(512, 18)

I am training it using a certain optimizer. I then want to use these learned parameters from the first model as the initialization scheme for a second model of the exact same architecture (as opposed to using, say, Xavier). I am aware that I need to use model_object.apply(initalization_function), but what would be the most efficient way to do this vis-a-vis the initialization scheme I described where I am using the learned parameters from another model as initialization for a new model?

Olivier Cruchant · Accepted Answer

If you want to load model1 parameters in model2, I believe this would work:

model2.load_state_dict(model1.state_dict()))

See an example of something similar in the official PyTorch transfer learning tutorial

In Pytorch, what is the most efficient way to copy the learned params of a model as the initialization for a second model of the same architecture?

Answers (1)

Related Questions