njho
njho

Reputation: 2158

LSTM to Predict Pattern 010101... Understanding Hidden State

I did a quick experiment to see if I could understand what the hidden state in an LSTM does...

I tried to make an LSTM predict a sequence of [1,0,1,0,1...] based off an input sequence of X with X[0] = 1 and the remainder as random noise.

X = [1, randFloat, randFloat, randFloat...]
label = [1, 0, 1, 0...]

In my head, the model would understand:

  1. The inputs X mean nothing, or at least very little (as it's noise) - so it'd discard these values for the most part
  2. Solely the hidden state from the previous sequence/timestep n would be used to predict the next timestep n+1... [1, 0, 1, 0...]
  3. I also set X[0] = 1 so the first initial in an attempt to guide the net to predicting 1 on the first item (which it does)

So, this didn't work. In theory, should it not? Can you someone explain?

It essentially never converges, and is on the cusp of guessing between 0 or 1


## Code
import os
import numpy as np
import torch

from torchvision import transforms
from torch import nn
from sklearn import preprocessing
from util import create_sequences
import torch.optim as optim

Create some fake data

sequence_1 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_1[0] = 1
sequence_2 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_2[0] = 1

labels_1 = np.zeros(50)
labels_1[::2] = 1
labels_1 = torch.tensor(labels_1, dtype=torch.long)
labels_2 = labels_1.clone()

training_data = [sequence_1, sequence_2]
label_data = [labels_1, labels_2]

Create simple LSTM Model

class LSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(LSTM, self).__init__()
        self.lstm = nn.LSTM(input_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, seq):
        lstm_out, _ = self.lstm(seq.view(len(seq), 1, -1))
        out = self.fc(lstm_out.view(len(seq), -1))
        out = F.log_softmax(out, dim=1)
        return out

We try to overfit on the dataset

INPUT_DIM = 1
HIDDEN_DIM = 6
model = LSTM(INPUT_DIM, HIDDEN_DIM, 2)

loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(500):  
    for i, seq in enumerate(training_data): 
        labels = label_data[i]
        model.zero_grad()
        scores = model(seq)
        loss = loss_function(scores, labels)
        loss.backward()
        print(loss)
        
        optimizer.step()
        

with torch.no_grad():
    seq_d = training_data[0]
    tag_scores = model(seq_d)
    for score in tag_scores: 
        print(np.argmax(score))

Upvotes: 0

Views: 111

Answers (1)

Nerveless_child
Nerveless_child

Reputation: 1412

I would say it's not meant to work.

The model would always try to make sense and find patterns in the data it's trained on i.e sequence_1 and to "verify" that it has "found" them, it uses labels_1. Since the data is random the model fails to find the pattern.

The pattern the model tries to find is not in the label but in the data, so it doesn't matter how the label is arranged. The label actually never passes through the model, so NO.

If perhaps, you trained it on a single example then definitely. The model will become overfit and give you your ones and zeros and fail miserably on other examples, otherwise it just won't be able to make sense of the random data no matter the size.

Hidden State

Solely the hidden state from the previous sequence/timestep n would be used to predict the next timestep n+1... [1, 0, 1, 0...]

Concerning Hidden state, NOTE that it is not a trainable parameter, it is the result of performing some operations on the data and parameters, meaning that the input data determines the Hidden state.

What the Hidden state does is to hold the information the model has extracted from the previous timesteps and passes it to the next timestep or as output. In the case of LSTM, it does some forgetting and updating before passing it.

Upvotes: 1

Related Questions