Reputation: 1334
I am trying to mimic a pytorch neural network in keras.
I am confident that my keras version of the neural network is very close to the one in pytorch but during training, I see that the loss value of the pytorch network are much lower than the loss values of the keras network. I wonder if this is because I have not properly copied the pytorch network in keras or the loss computation is different in the two framework.
Pytorch loss definition:
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=0.9, weight_decay=5e-4)
Keras loss definition:
sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
resnet.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy'])
Note that all the layers in the keras network have been implemented with L2 regularization kernel_regularizer=regularizers.l2(5e-4)
, also I used he_uniform
initialization which I believe is default in pytorch, according to the source code.
The batch size for the two networks are the same: 128
.
In the pytorch version, I get loss values around 4.1209
which decreases to around 0.5
. In keras it starts around 30 and decreases to 2.5
.
Upvotes: 5
Views: 5416
Reputation: 3580
Keras categorical_crossentropy
by default uses from_logits=False
which means it assumes y_pred
contains probabilities (not raw scores) (source). You can choose to use a softmax/sigmoid layer, just make sure to set the from_logits
argument accordingly.
PyTorch CrossEntropyLoss
accepts unnormalized scores for each class i.e., not probability (source). Thus, if using CrossEntropyLoss
you should not use a softmax/sigmoid layer at the end of your model.
If this confuses you, please read this discuss.pytorch post.
Upvotes: 13
Reputation: 1334
In my case, the reason why the displayed losses in the two models was different is because Keras prints the sum of the cross entropy loss with the regularization term whereas in the pytorch model only the categorical cross entropy was printed.
Upvotes: 3