PyTorch: LSTM predicts the same constant value

Question

I want to predict one variable using 7 features with time steps of 4:

# Shape X_train: torch.Size([24433, 4, 7]
# Shape Y_train: torch.Size([24433, 4, 1]

# Shape X_test: torch.Size([6109, 4, 7]
# Shape Y_test: torch.Size([6109, 4, 1]

train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test) 

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

My (initial) LSTM model:

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size)
        self.linear = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        x, _ = self.lstm(x)
        x = self.linear(x)
        return x

model = LSTMModel(input_size=7, hidden_size=256, output_size=1)

loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

Apply model:

# Loop over the training set
for X, Y in train_loader:

    optimizer.zero_grad()
    
    Y_pred = model(X)

    loss = loss_fn(Y_pred, Y)
    
    loss.backward()
    
    optimizer.step()

model.eval()

# Loop over the test set
for X, Y in test_loader:

    Y_pred = model(X)
    
    loss = loss_fn(Y_pred, Y)

An example of Y (true data):

tensor([[[59.],
         [59.],
         [59.],
         [59.]],

        [[70.],
         [70.],
         [70.],
         [70.]],

        [[ 100.],
         [ 0.],
         [ 0.],
         [ 0.]],

# etc.

However, my Y_pred is somewhat like this:

 tensor([[[15.8224],
         [15.8224],
         [15.8224],
         [15.8224]],

        [[16.1654],
         [16.1654],
         [16.1654],
         [16.1654]],

        [[16.2127],
         [16.2127],
         [16.2127],
         [16.2127]],

# etc.

I have tried numerous different things:

Changing the model architecture (different batch size, different number of layers)
Adding dropout and decay parameters
Using epochs and changing the number of epochs when looping over training and test data
Different optimizers (Adam, SGD) with different learning rates
Log transforming my input data

Examples of my data in a previous question.

I am fairly new with PyTorch and LSTMs so I might do it wrong, but, whatever I change, I keep getting a (near) constant value from the predictions. What am I doing wrong/what should I be doing?

user17515752 · Accepted Answer

I solved this by normalizing my input data. I now obtain different predictions for every output:

# Calculate the mean and standard deviation of each feature in the training set
X_mean = X_train.mean(dim=0)
X_std = X_train.std(dim=0)

# Standardize the training set
X_train = (X_train - X_mean) / X_std

# Standardize the test set using the mean and standard deviation of the training set
X_test = (X_test - X_mean) / X_std

PyTorch: LSTM predicts the same constant value

Answers (1)

Related Questions