Reputation: 3205
So I am working with different deep learning frameworks as part of my research and have observed something weird (at least I cannot explain the cause of it).
I trained a fairly simple MLP model (on mnist dataset) in Tensorflow, extracted trained weights, created the same model architecture in PyTorch and applied the trained weights to PyTorch model. Now my expectation is to get same test accuracy from both Tensorflow and PyTorch models but this isn't the case. I get different results.
So my question is: If a model is trained to some optimal value, shouldn't the trained weights produce same results every time testing is done on the same dataset (regardless of the framework used)?
PyTorch Model:
class Net(nn.Module):
def __init__(self) -> None:
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 24)
self.fc2 = nn.Linear(24, 10)
def forward(self, x: Tensor) -> Tensor:
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
Tensorflow Model:
def build_model() -> tf.keras.Model:
# Build model layers
model = models.Sequential()
# Flatten Layer
model.add(layers.Flatten(input_shape=(28,28)))
# Fully connected layer
model.add(layers.Dense(24, activation='relu'))
model.add(layers.Dense(10))
# compile the model
model.compile(
optimizer='sgd',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy']
)
# return newly built model
return model
To extract weights from Tensorflow model and apply them to Pytorch model I use following functions:
Extract Weights:
def get_weights(model):
# fetch latest weights
weights = model.get_weights()
# transpose weights
t_weights = []
for w in weights:
t_weights.append(np.transpose(w))
# return
return t_weights
Apply Weights:
def set_weights(model, weights):
"""Set model weights from a list of NumPy ndarrays."""
state_dict = OrderedDict(
{k: torch.Tensor(v) for k, v in zip(model.state_dict().keys(), weights)}
)
self.load_state_dict(state_dict, strict=True)
Upvotes: 3
Views: 1323
Reputation:
Providing solution in answer section for the benefit of community. From comments
If you are using the same weights in the same manner then results should be the same, though float rounding error should also be accounted. Also it doesn't matter if model is trained at all. You can think of your model architecture as a chain of matrix multiplications with element-wise nonlinearities in between. How big is the difference? Are you comparing model outputs, our metrics computed over dataset? As a suggestion, intialize model with some random values in Keras, do a forward pass for a single batch (paraphrased from jdehesa and Taras Sereda)
Upvotes: 2