Peter Bergman
Peter Bergman

Reputation: 684

How do I create a DataLoaders using rows of a DataFrame?

I am trying to create a model that will predict the next row of values. There are 7 columns, but I am only using the first 6. I figure that if I pass in the datetimes in column 7 to the model, that will guarantee overfitting. Here is a screenshot of the DataFrame: DataFrame with X and Y values shown

I am using an arbitrary number of rows, 100 in this case, to make this prediction. All I need to know is some way to create a DataLoader where the y value is the row that I want to predict, and the x value is the 100 preceding rows.

If there is a way to do this with a DataBlock, that would be preferred. I have thought about using .loc and .iloc, but I do not know how I would use those to create a DataLoader.

Upvotes: 1

Views: 423

Answers (1)

Stupid Loser
Stupid Loser

Reputation: 96

Create your custom dataset like this:

class TimeSeriesDataset:
    def __init__(self, df, input_features: list, 
                 output_features: list, lookback=99, lookahead=1):
        self.df = df
        self.lookback = lookback
        self.lookahead = lookahead

    def __len__(self):
        return len(self.df) - self.lookback

    def __getitem__(self, idx):
        idx += self.lookback
        lookback = self.df.iloc[idx-self.lookback:idx]
        lookahead = self.df.iloc[idx]
        lookback = lookback[self.input_features].values
        lookahead = lookahead[self.output_features].values
        X = T.tensor(lookback)
        y = T.tensor(lookahead)
        return X, y

Then make your dataloader like this.

dataset = TimeSeriesDataset(df, input_features, ouput_features)
dataloader = DataLoader(dataset, batch_size=batch_size)

Upvotes: 1

Related Questions