Reputation: 181
Trying to translate a simple LSTM model in Keras to PyTorch code. The Keras model converges after just 200 epochs, while the PyTorch model:
This is the Keras code:
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
X = array([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).reshape((6,3,1))
y = array([40,50,60,70,80,90])
model = Sequential()
model.add(LSTM(50, activation='relu', recurrent_activation='sigmoid', input_shape=(3, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=1)
x_input = array([70, 80, 90]).reshape((1, 3, 1))
yhat = model.predict(x_input, verbose=0)
print(yhat)
And this is the equivalent PyTorch code:
from numpy import array
import torch
import torch.nn as nn
import torch.nn.functional as F
X = torch.tensor([10,20,30,20,30,40,30,40,50,40,50,60,50,60,70,60,70,80]).float().reshape(6,3,1)
y = torch.tensor([40,50,60,70,80,90]).float().reshape(6,1)
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lstm = nn.LSTM(input_size=1, hidden_size=50, num_layers=1, batch_first=True)
self.fc = nn.Linear(50, 1)
def forward(self, x):
batches = x.size(0)
h0 = torch.zeros([1, batches, 50])
c0 = torch.zeros([1, batches, 50])
(x, _) = self.lstm(x, (h0, c0))
x = x[:,-1,:] # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
x = F.relu(x)
x = self.fc(x)
return x
model = Model()
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
n_epochs = 8000
for epoch in range(n_epochs):
model.train()
optimizer.zero_grad()
y_ = model(X)
loss = criterion(y_, y)
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}/{n_epochs}, loss = {loss.item()}")
model.eval()
x_input = torch.tensor([70, 80, 90]).float().reshape((1, 3, 1))
yhat = model(x_input)
print(yhat)
The only possible difference is the initial weight and bias values, but I don't think that slightly different weights and biases may account for such a big difference in behavior. What am I missing in the PyTorch code?
Upvotes: 5
Views: 3724
Reputation: 31
I think you are initializing h0,c0 every time which is require at initial. So, better use the code below that i have modified. You can go through this link for RNN in pytorch: https://pytorch.org/docs/stable/nn.html?highlight=rnn#torch.nn.RNN
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.rnn = nn.RNN(input_size=1, hidden_size=50, num_layers=1, nonlinearity="relu", batch_first=True)
self.fc = nn.Linear(50, 1)
def forward(self, x):
# batches = x.size(0)
# h0 = torch.zeros([1, batches, 50])
# c0 = torch.zeros([1, batches, 50])
# (x, _) = self.lstm(x, (h0, c0))
(x, _) = self.rnn(x)
x = x[:,-1,:] # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
x = F.relu(x)
x = self.fc(x)
return x
This gives good result of prediction within 2500 epochs. I want to know why have you written below line of code and what is the purpose of it. So, that i can try to make it better.
x = x[:,-1,:] # Keep only the output of the last iteration. Before shape (6,3,50), after shape (6,50)
Upvotes: 1
Reputation: 6054
The behaviour difference is because of the activation function in the LSTM API. By changing the activation to tanh, I can reproduce the problem in Keras too.
model.add(LSTM(50, activation='tanh', recurrent_activation='sigmoid', input_shape=(3, 1)))
There is no option to change the activation function to 'relu' in the pytorch LSTM API. https://pytorch.org/docs/stable/nn.html#lstm
Taking the LSTM implementation from here, https://github.com/huggingface/torchMoji/blob/master/torchmoji/lstm.py and changing hardsigmoid/tanh to sigmoid/relu, the model converges in pytorch as well.
Upvotes: 5