Shaitender Singh
Shaitender Singh

Reputation: 2207

Predict the future demand of product in multiple weeks

I wanted to create a Model which predict the future demand of each product in multiple weeks at each step (predict next year's weekly demand for each product)

I have a few small sizes (around 100-200 records) csv.

here information about the CSV columns:- The first column makeId represents the id of the product. The second column areaId represents an internal id of the location where the product is sold. The third column date represents the date in mm/dd/yyyy format. The fourth column amount represents the demand for the given product at the given area for a given week.

Sample File 1-

enter image description here

Sample File 2-

enter image description here

Sample File 3-

enter image description here

I thought of going with Arima model, but I m a bit confused about how to get data into a weekly format and use it to predict for each make id .

Any suggestion would be helpful, as I m new to time series problem

Upvotes: 1

Views: 352

Answers (1)

Savage Henry
Savage Henry

Reputation: 2069

NOTE: From a quick glance at your examples, it looks like you already have weekly data. The following answer will help if that is not true, or if you are just looking to set your dataframe up to be able to use ARIMA models.

The quick answer to your question is: use the pandas package to read in/manipulate your data into a dataframe object, then use the .resample() method with the weekly frequency, e.g.: .resample('W').

More details:

For time series analysis, most applications will benefit from setting the index of your data to the time variable. In your case, you can do this on reading in the data using pandas:

import pandas as pd df = pd.read_csv('/path/to/your_data.csv', parse_dates=['date'], index_col='date')

If you cannot read it in, and need to change the dataframe in place, you can do:

df = df.set_index('date')

This assumes that the date column is set correctly as datetime object.

The next step is to resample the data so that you have a new value that captures the weekly activity in your data. This requires choosing a method to combine the data in your Amount field, since you want to show a value that may combine the values from multiple days. Here I'll choose the mean(), so that the new value is an average of the data of those days present during that week.

df['Amount_weekly'] = df['Amount'].resample('W', how='mean')

Since you are aggregating data, the function returns a new series, so here I am putting that new series into a new column in your df, that is df['Amount_weekly'].

As a result, you will have a time-series indexed dataframe with a column that shows the weekly-resampled data. This will be an appropriate format to use in ARIMA models in a package like statsmodels.

Upvotes: 2

Related Questions