LSTM to Predict Pattern 010101... Understanding Hidden State

Question

I did a quick experiment to see if I could understand what the hidden state in an LSTM does...

I tried to make an LSTM predict a sequence of [1,0,1,0,1...] based off an input sequence of X with X[0] = 1 and the remainder as random noise.

X = [1, randFloat, randFloat, randFloat...]
label = [1, 0, 1, 0...]

In my head, the model would understand:

The inputs X mean nothing, or at least very little (as it's noise) - so it'd discard these values for the most part
Solely the hidden state from the previous sequence/timestep n would be used to predict the next timestep n+1... [1, 0, 1, 0...]
I also set X[0] = 1 so the first initial in an attempt to guide the net to predicting 1 on the first item (which it does)

So, this didn't work. In theory, should it not? Can you someone explain?

It essentially never converges, and is on the cusp of guessing between 0 or 1

## Code
import os
import numpy as np
import torch

from torchvision import transforms
from torch import nn
from sklearn import preprocessing
from util import create_sequences
import torch.optim as optim

Create some fake data

sequence_1 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_1[0] = 1
sequence_2 = torch.tensor(np.random.uniform(size=50)).float().detach()
sequence_2[0] = 1

labels_1 = np.zeros(50)
labels_1[::2] = 1
labels_1 = torch.tensor(labels_1, dtype=torch.long)
labels_2 = labels_1.clone()

training_data = [sequence_1, sequence_2]
label_data = [labels_1, labels_2]

Create simple LSTM Model

class LSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(LSTM, self).__init__()
        self.lstm = nn.LSTM(input_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, seq):
        lstm_out, _ = self.lstm(seq.view(len(seq), 1, -1))
        out = self.fc(lstm_out.view(len(seq), -1))
        out = F.log_softmax(out, dim=1)
        return out

We try to overfit on the dataset

INPUT_DIM = 1
HIDDEN_DIM = 6
model = LSTM(INPUT_DIM, HIDDEN_DIM, 2)

loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(500):  
    for i, seq in enumerate(training_data): 
        labels = label_data[i]
        model.zero_grad()
        scores = model(seq)
        loss = loss_function(scores, labels)
        loss.backward()
        print(loss)
        
        optimizer.step()
        

with torch.no_grad():
    seq_d = training_data[0]
    tag_scores = model(seq_d)
    for score in tag_scores: 
        print(np.argmax(score))

LSTM to Predict Pattern 010101... Understanding Hidden State

Create some fake data

Create simple LSTM Model

We try to overfit on the dataset

Answers (1)

Hidden State

Related Questions