Reputation: 2158
I did a quick experiment to see if I could understand what the hidden state
in an LSTM does...
I tried to make an LSTM predict a sequence of [1,0,1,0,1...] based off an input sequence of X
with X[0] = 1
and the remainder as random noise.
X = [1, randFloat, randFloat, randFloat...]
label = [1, 0, 1, 0...]
In my head, the model would understand:
X
mean nothing, or at least very little (as it's noise) - so it'd discard these values for the most parthidden state
from the previous sequence/timestep n
would be used to predict the next timestep n+1
... [1, 0, 1, 0...]X[0] = 1
so the first initial in an attempt to guide the net to predicting 1 on the first item (which it does)So, this didn't work. In theory, should it not? Can you someone explain?
It essentially never converges, and is on the cusp of guessing between 0
or 1
## Code
import os
import numpy as np
import torch
from torchvision import transforms
from torch import nn
from sklearn import preprocessing
from util import create_sequences
import torch.optim as optim
sequence_1 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_1[0] = 1
sequence_2 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_2[0] = 1
labels_1 = np.zeros(50)
labels_1[::2] = 1
labels_1 = torch.tensor(labels_1, dtype=torch.long)
labels_2 = labels_1.clone()
training_data = [sequence_1, sequence_2]
label_data = [labels_1, labels_2]
class LSTM(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(LSTM, self).__init__()
self.lstm = nn.LSTM(input_dim, hidden_dim)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, seq):
lstm_out, _ = self.lstm(seq.view(len(seq), 1, -1))
out = self.fc(lstm_out.view(len(seq), -1))
out = F.log_softmax(out, dim=1)
return out
INPUT_DIM = 1
HIDDEN_DIM = 6
model = LSTM(INPUT_DIM, HIDDEN_DIM, 2)
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
for epoch in range(500):
for i, seq in enumerate(training_data):
labels = label_data[i]
model.zero_grad()
scores = model(seq)
loss = loss_function(scores, labels)
loss.backward()
print(loss)
optimizer.step()
with torch.no_grad():
seq_d = training_data[0]
tag_scores = model(seq_d)
for score in tag_scores:
print(np.argmax(score))
Upvotes: 0
Views: 111
Reputation: 1412
I would say it's not meant to work.
The model would always try to make sense and find patterns in the data it's trained on i.e sequence_1
and to "verify" that it has "found" them, it uses labels_1
. Since the data is random the model fails to find the pattern.
The pattern the model tries to find is not in the label but in the data, so it doesn't matter how the label is arranged. The label actually never passes through the model, so NO.
If perhaps, you trained it on a single example then definitely. The model will become overfit and give you your ones and zeros and fail miserably on other examples, otherwise it just won't be able to make sense of the random data no matter the size.
Solely the hidden state from the previous sequence/timestep n would be used to predict the next timestep n+1... [1, 0, 1, 0...]
Concerning Hidden state, NOTE that it is not a trainable parameter, it is the result of performing some operations on the data and parameters, meaning that the input data determines the Hidden state.
What the Hidden state does is to hold the information the model has extracted from the previous timesteps and passes it to the next timestep or as output. In the case of LSTM, it does some forgetting and updating before passing it.
Upvotes: 1