Reputation: 684
I am trying to create a model that will predict the next row of values. There are 7 columns, but I am only using the first 6. I figure that if I pass in the datetimes in column 7 to the model, that will guarantee overfitting. Here is a screenshot of the DataFrame:
I am using an arbitrary number of rows, 100 in this case, to make this prediction. All I need to know is some way to create a DataLoader where the y value is the row that I want to predict, and the x value is the 100 preceding rows.
If there is a way to do this with a DataBlock, that would be preferred. I have thought about using .loc and .iloc, but I do not know how I would use those to create a DataLoader.
Upvotes: 1
Views: 423
Reputation: 96
Create your custom dataset like this:
class TimeSeriesDataset:
def __init__(self, df, input_features: list,
output_features: list, lookback=99, lookahead=1):
self.df = df
self.lookback = lookback
self.lookahead = lookahead
def __len__(self):
return len(self.df) - self.lookback
def __getitem__(self, idx):
idx += self.lookback
lookback = self.df.iloc[idx-self.lookback:idx]
lookahead = self.df.iloc[idx]
lookback = lookback[self.input_features].values
lookahead = lookahead[self.output_features].values
X = T.tensor(lookback)
y = T.tensor(lookahead)
return X, y
Then make your dataloader like this.
dataset = TimeSeriesDataset(df, input_features, ouput_features)
dataloader = DataLoader(dataset, batch_size=batch_size)
Upvotes: 1