Reputation: 75
I am trying to predict a variable using 7 features in time steps of 4.
# Shape X_train: torch.Size([24433, 4, 7]
# Shape Y_train: torch.Size([24433, 4, 1]
# Shape X_test: torch.Size([6109, 4, 7]
# Shape Y_test: torch.Size([6109, 4, 1]
train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
Example of data:
print(train_dataset[0], test_dataset[0])
(tensor([[ 7909.0000, 8094.0000, 9119.0000, 8666.0000, 17599.0000, 13657.0000,
10158.0000],
[ 7909.0000, 8073.0000, 9119.0000, 8636.0000, 17609.0000, 13975.0000,
10109.0000],
[ 7939.5000, 8083.5000, 9166.5000, 8659.5000, 18124.5000, 13971.0000,
10142.0000],
[ 7951.0000, 8064.0000, 9201.0000, 8663.0000, 17985.0000, 13967.0000,
10076.0000]]), tensor([[41.],
[41.],
[41.],
[41.]]))
(tensor([[ 8411.0000, 8530.0000, 9439.0000, 9101.0000, 17368.0000, 14174.0000,
11111.0000],
[ 8460.0000, 8651.5000, 9579.5000, 9355.5000, 17402.0000, 14509.0000,
11474.5000],
[ 8436.0000, 8617.0000, 9579.0000, 9343.0000, 17318.0000, 14288.0000,
11404.0000],
[ 8519.0000, 8655.0000, 9580.0000, 9348.0000, 17566.0000, 14640.0000,
11404.0000]]), tensor([[59.],
[59.],
[59.],
[59.]]))
I have created an LSTM model in PyTorch:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x):
x, _ = self.lstm(x)
x = self.linear(x)
return x
model = LSTMModel(input_size=7, hidden_size=256, output_size=1)
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
I chose for hidden_size=256 and this optimizer as it returned me the lowest loss. (But whatever I chose, the problem remains.)
Then train and apply model (I make lists to check predictions):
pred_train = []
true_train = []
model.train()
# Loop over the training set
for X, Y in train_loader:
optimizer.zero_grad()
Y_pred = model(X)
pred_train.append(Y_pred)
true_train.append(Y)
loss = loss_fn(Y_pred, Y)
loss.backward()
optimizer.step()
model.eval()
pred_test = []
true_test = []
# Loop over the test set
for X, Y in test_loader:
Y_pred = model(X)
pred_test.append(Y_pred)
true_test.append(Y)
loss = loss_fn(Y_pred, Y)
When I check the predictions:
print(true_train[0], pred_train[0]) # or i, goes for every iteration
print(true_test[0], pred_test[0])
I get (shortened):
# True train data (L) & predicted train data (R)
tensor([[[ 3.], tensor([[[ 0.1095],
[ 3.], [ 0.0221],
[ 3.], [ 0.0087],
[ 3.]], [-0.0308]],
[[100.], [[ 0.0922],
[ 0.], [ 0.0395],
[ 0.], [-0.0423],
[ 0.]], [-0.0592]],
[[ 57.], [[ 0.0228],
[ 57.], [-0.0332],
[ 57.], [ 0.0296],
[ 57.]], [ 0.0018]],
... ...
# True test data (L) & predicted test data (R)
tensor([[[ 59.], tensor([[[20.6179],
[ 59.], [20.6179],
[ 59.], [20.6179],
[ 59.]], [20.6179]],
[[ 70.], [[23.4562],
[ 70.], [23.4562],
[ 70.], [23.4562],
[ 70.]], [23.4562]],
[[ 0.], [[23.8913],
[ 0.], [23.8913],
[ 0.], [23.8913],
[ 0.]], [23.8913]],
... ...
[[23.9606],
[23.9606],
[23.9606],
[23.9606]],
Also interesting regarding the training predictions:
print(pred_train[0], pred_train[5], pred_train[10])
Returns:
tensor([[[ 0.1095],
[ 0.0221],
[ 0.0087],
[-0.0308]],
[[ 0.0922],
[ 0.0395],
[-0.0423],
[-0.0592]],
...
tensor([[[18.4983],
[18.4983],
[18.4983],
[18.4983]],
[[20.6157],
[21.0552],
[21.0552],
[21.0552]],
...
tensor([[[25.8706],
[25.8706],
[25.8706],
[25.8706]],
[[29.2633],
[29.2633],
[29.2633],
[29.2633]],
...
The further the iteration, the higher the predictions in the training loop become.
As you can see, the predictions (output) made in the test loop remain (~) the same. Eventually, they become constant: 23.9606
.
But why is the output the same for every iteration in the test loop, and why do the predictions become higher in the training loop? What am I doing wrong/what should I be doing to get correct output?
Upvotes: 0
Views: 324
Reputation: 75
I somewhat solved this by normalizing my input data. I now obtain different predictions for every output. Whether they are good or not is something I have to figure out!
# Calculate the mean and standard deviation of each feature in the training set
X_mean = X_train.mean(dim=0)
X_std = X_train.std(dim=0)
# Standardize the training set
X_train = (X_train - X_mean) / X_std
# Standardize the test set using the mean and standard deviation of the training set
X_test = (X_test - X_mean) / X_std
Upvotes: 1