PyTorch version of as simple Keras LSTM model

Question

Trying to translate a simple LSTM model in Keras to PyTorch code. The Keras model converges after just 200 epochs, while the PyTorch model:

needs many more epochs to reach the same loss level (200 vs. ~8000)
seems to overfit the inputs because the predicted value is not near 100

This is the Keras code:

from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense

X = array([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).reshape((6,3,1))
y = array([40,50,60,70,80,90])
model = Sequential()
model.add(LSTM(50, activation='relu', recurrent_activation='sigmoid',  input_shape=(3, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=1)
x_input = array([70, 80, 90]).reshape((1, 3, 1))
yhat = model.predict(x_input, verbose=0)
print(yhat)

And this is the equivalent PyTorch code:

from numpy import array
import torch
import torch.nn as nn
import torch.nn.functional as F

X = torch.tensor([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).float().reshape(6,3,1)
y = torch.tensor([40,50,60,70,80,90]).float().reshape(6,1)

class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    self.lstm = nn.LSTM(input_size=1, hidden_size=50, num_layers=1, batch_first=True)
    self.fc = nn.Linear(50, 1)

  def forward(self, x):
    batches = x.size(0)
    h0 = torch.zeros([1, batches, 50])
    c0 = torch.zeros([1, batches, 50])
    (x, _) = self.lstm(x, (h0, c0))
    x = x[:,-1,:]  # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
    x = F.relu(x)
    x = self.fc(x)
    return x

model = Model()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())

n_epochs = 8000
for epoch in range(n_epochs):
  model.train()
  optimizer.zero_grad()
  y_ = model(X)
  loss = criterion(y_, y)
  loss.backward()
  optimizer.step()
  print(f"Epoch {epoch+1}/{n_epochs}, loss = {loss.item()}")

model.eval()
x_input = torch.tensor([70, 80, 90]).float().reshape((1, 3, 1))
yhat = model(x_input)
print(yhat)

The only possible difference is the initial weight and bias values, but I don't think that slightly different weights and biases may account for such a big difference in behavior. What am I missing in the PyTorch code?

Manoj Mohan · Accepted Answer

The behaviour difference is because of the activation function in the LSTM API. By changing the activation to tanh, I can reproduce the problem in Keras too.

model.add(LSTM(50, activation='tanh', recurrent_activation='sigmoid', input_shape=(3, 1)))

There is no option to change the activation function to 'relu' in the pytorch LSTM API. https://pytorch.org/docs/stable/nn.html#lstm

Taking the LSTM implementation from here, https://github.com/huggingface/torchMoji/blob/master/torchmoji/lstm.py and changing hardsigmoid/tanh to sigmoid/relu, the model converges in pytorch as well.

PyTorch version of as simple Keras LSTM model

Answers (2)

Related Questions