Reputation: 147
I have modified pytorch tutorial on LSTM (sine-wave prediction: given [0:N] sine-values -> [N:2N] values) to use Adam optimizer instead of LBFGS optimizer. However, the model does not train well and cannot predict sine-wave correctly. Since in most cases we use Adam optimizer for RNN training, I wonder how this issue can be resolved. I also wonder if the code segment regarding sequence-in-sequence-out (done with a loop: for input_t in input.split(1, dim=1)), can be done by a pytorch module or function.
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib
#matplotlib.use('Agg')
import matplotlib.pyplot as plt
class Sequence(nn.Module):
def __init__(self):
super(Sequence, self).__init__()
self.lstm1 = nn.LSTMCell(1, 51)
self.lstm2 = nn.LSTMCell(51, 51)
self.linear = nn.Linear(51, 1)
def forward(self, input, future = 0):
outputs = []
h_t = torch.zeros(input.size(0), 51, dtype=torch.double)
c_t = torch.zeros(input.size(0), 51, dtype=torch.double)
h_t2 = torch.zeros(input.size(0), 51, dtype=torch.double)
c_t2 = torch.zeros(input.size(0), 51, dtype=torch.double)
for input_t in input.split(1, dim=1):
h_t, c_t = self.lstm1(input_t, (h_t, c_t))
h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2))
output = self.linear(h_t2)
outputs += [output]
for i in range(future):# if we should predict the future
h_t, c_t = self.lstm1(output, (h_t, c_t))
h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2))
output = self.linear(h_t2)
outputs += [output]
outputs = torch.cat(outputs, dim=1)
return outputs
if __name__ == '__main__':
# set random seed to 0
np.random.seed(0)
torch.manual_seed(0)
# load data and make training set
data = torch.load('traindata.pt')
input = torch.from_numpy(data[3:, :-1])
target = torch.from_numpy(data[3:, 1:])
test_input = torch.from_numpy(data[:3, :-1])
test_target = torch.from_numpy(data[:3, 1:])
print("input.size", input.size())
print("target.size", target.size())
# build the model
seq = Sequence()
seq.double()
criterion = nn.MSELoss()
# use LBFGS as optimizer since we can load the whole data to train
optimizer = optim.Adam(seq.parameters(), lr=0.005)
#begin to train
for i in range(15):
print('STEP: ', i)
seq.train()
def run1step():
optimizer.zero_grad()
out = seq(input)
loss = criterion(out, target)
print('train loss:', loss.item())
loss.backward()
return loss
run1step()
optimizer.step()
# begin to predict, no need to track gradient here
seq.eval()
with torch.no_grad():
future = 1000
pred = seq(test_input, future=future)
loss = criterion(pred[:, :-future], test_target)
print('test loss:', loss.item())
y = pred.detach().numpy()
# draw the result
def draw(yi, color):
plt.figure(figsize=(30,10))
plt.title('Predict future values for time sequences\n(Dashlines are predicted values)', fontsize=30)
plt.xlabel('x', fontsize=20)
plt.ylabel('y', fontsize=20)
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.plot(np.arange(input.size(1)), yi[:input.size(1)], color, linewidth = 2.0)
plt.plot(np.arange(input.size(1), input.size(1) + future), yi[input.size(1):], color + ':', linewidth = 2.0)
plt.show()
if i == 14:
draw(y[0], 'r')
draw(y[1], 'g')
draw(y[2], 'b')
plt.savefig('predict_LSTM%d.pdf'%i)
#plt.close()
Upvotes: 0
Views: 1841
Reputation: 771
I've just executed your code and the original code. I think the problem is you didn't train your code with ADAM long enough. You can see your training loss is still getting smaller at step 15. So I changed the number of steps from 15 to 45 and this is the figure generated after step 40:
The original code reached 4e-05 loss after step 4. But after that, the loss somehow exploded. Your code with ADAM can reduce the loss across all 45 steps, but the final loss is around 0.001. I hope I run both programs correctly.
Oh, regarding your second question.
also wonder if the code segment regarding sequence-in-sequence-out
Yes, you can write a function or define a module with two LSTMs to do that. But it doesn't make sense since your network contains only two LSTMs. After all, you have to do this “wiring“ work at some point.
If your network contains several such blocks, you can write a module with two LSTMs and use it as a primitive module, e.g. self.BigLSTM = BigLSTM(...)
, just like you define self.lstm1 = nn.LSTMCell(...)
.
Upvotes: 1