farzaneh
farzaneh

Reputation: 41

Transferring Tensorflow weights to an equivalent Pytorch model

I had an old implementation of Unet in Tensorflow that has been trained on custom data. I have saved the weights in .hdf5 file format. Now, I want to convert my codes to Pytorch and I already implemented an equivalent model in Pytorch. However, I have challenge in using the weights in the new Pytorch model. To convert Tensorflow weights to Pytorch weight, I copy weights from tensorflow (layer by layer) to a state_dict dictionary from my pytorch model (as explained in the code) and load the model with this new dictionary. However, the final Pytorch model does not have similar output as the Tensorflow model (the output is garbage).

Is there anything I am missing here? note that in each layer, I had to transpose the weight in order to becom similar to Pytorch format. I think the problem should be here. But I don't know how to fix it. any guidence to how to approach this problem is also helpful

def weight_loading(pretrained_weights):
    # Load the weights
    tf_model = tf.keras.models.load_model(pretrained_weights)
    tf_weights = tf_model.get_weights()

    # Load the PyTorch model
    pt_model = UNet() #implemented based on the previous model (by myself)
    initial_state_dict = pt_model.state_dict()
    new_state_dict = {}
    with torch.no_grad():
        x = 0
        for i, layer in enumerate(pt_model.modules()):
            if isinstance(layer, torch.nn.Conv2d):
                # extract the weights and biases from the TensorFlow weights
                weight_tf = tf_weights[x*2]
                bias_tf = tf_weights[x*2+1]
             
                # convert the weights and biases to PyTorch format
                weight_pt = torch.tensor(weight_tf.transpose())
                bias_pt = torch.tensor(bias_tf)

                # get the name of the weight and bias tensors
                weight_name = list(pt_model.named_parameters())[x*2][0]
                bias_name = list(pt_model.named_parameters())[x*2+1][0]

                # set the weights and biases in the PyTorch model state_dict
                new_state_dict[weight_name]= weight_pt
                new_state_dict[bias_name] = bias_pt

                x = x + 1

            if isinstance(layer, torch.nn.ConvTranspose2d):
                weight_tf = tf_weights[x*2]
                bias_tf = tf_weights[x*2+1]
                
                # convert the weights and biases to PyTorch format
                weight_pt = torch.tensor(np.transpose(weight_tf, (2, 3, 0, 1)))
                bias_pt = torch.tensor(bias_tf)

                # get the name of the weight and bias tensors
                weight_name = list(pt_model.named_parameters())[x*2][0]
                bias_name = list(pt_model.named_parameters())[x*2+1][0]

                # set the weights and biases in the PyTorch model state_dict
                new_state_dict[weight_name] = weight_pt
                new_state_dict[bias_name] = bias_pt

                x = x + 1

    # load the new generated state_dict to pt_model
    pt_model.load_state_dict(new_state_dict)
    return pt_model

In this code, I copied weights from a Tensorflow model to a Pytorch model (layer by layer). each layer is a Cov2d or a ConvTranspose2d. I expect that when I load the Pytorch model with converted weights and run it for an image, I have an output similar to the Tensorflow model output for the same image. But they were not the same and they were very different.

Update: I checked the output of two models after first maxpooling in unet (after two conv layers) and they were slightly different (in comparison with the output from randomly initiated pytorch model which was very different).

Upvotes: 1

Views: 4264

Answers (1)

farzaneh
farzaneh

Reputation: 41

I could finally solve the above problem (almost) apparently, you need to explicitly say that you want to convert TensorFlow weights to torch float tensors. (like below) So, I replaced this:

# convert the weights and biases to PyTorch format
weight_pt = torch.tensor(np.transpose(weight_tf, (2, 3, 0, 1)))
bias_pt = torch.tensor(bias_tf)

# get the name of the weight and bias tensors
weight_name = list(pt_model.named_parameters())[x*2][0]
bias_name = list(pt_model.named_parameters())[x*2+1][0]

# set the weights and biases in the PyTorch model state_dict
new_state_dict[weight_name] = weight_pt
new_state_dict[bias_name] = bias_pt

to code like this: (also cleaning the code and removing model loading)

layer.weight.data = torch.tensor(weight_tf.transpose(2, 3, 0, 1), dtype=torch.float)
layer.bias.data = torch.tensor(bias_tf, dtype=torch.float)

after this, I ended up with almost the same results after the first convolution layer. (the two output was still a little bit different but I ignored them)

Also, I change all nn.ConvTranspose2d to a simple upsampling and convolution (as implemented in original source) After this, the final output of my model was similar enough to the TensorFlow model. I think the implementation of upsampling and conv2d with even kernel size in PyTorch is different than TensorFlow and that causes the difference in outputs. However, since the difference was so small and didn't affect our goal, we ignored them.

Upvotes: 3

Related Questions