An appropriate way of adding a feature to a time series forecasting model input

Question

I have been working on a demand forecasting model for a while. I am using an LSTM model to predict the future demand of a product family of a company. To solidify and exemplify my raw data, an example is as below;

Unprocessed data

np.random.seed(1)

raw_data = pd.DataFrame({"product_type": ["A"]*3 + ["B"]*3 + ["C"]*3, "product_family": ["x", "y", "z", "t", "u", "y", "p", "k", "l"]})

for col in [str(x)+"-"+str(y) for x in range(2015, 2020) for y in range(1, 13)]:
    raw_data[col] = np.random.randint(10, 50, 9)

raw_data.head()

  product_type product_family  2015-1  ...  2019-10  2019-11  2019-12
0            A              x      47  ...       15       39       38
1            A              y      22  ...       37       28       29
2            A              z      18  ...       41       41       37
3            B              t      19  ...       32       44       29
4            B              u      21  ...       22       29       25
[5 rows x 62 columns]

As can be seen above, the data has two nominal feature, and the rest are the past demand data.

First, let me interpret what I do in my case:

I first select the product_family to be predicted and let that product_family be "x":

prod_family_data = raw_data.loc[raw_data.product_family == "x", raw_data.columns[2:]].to_numpy()

Then I create the x and y of the training set:

x_train, y_train = [], []

for i in range(0, len(prod_family_data) - 12):
    x_train.append(prod_family_data[i: i + 12])
    y_train.append(prod_family_data[i + 12])

x_train = np.array(x_train)

y_train = np.array(y_train)

array([[47, 11, 21, 32, 34, 14, 35, 49, 44, 42, 31, 18],
       .  
       .
       .
       [14, 20, 45, 13, 48, 43, 45, 49, 49, 37, 15, 39]], dtype=object)

y_train

array([28, 38, 12, 12, 23, 29, 19, 23, 39, 38, 18, 40, 46, 48, 44, 27, 10,
       24, 25, 22, 15, 28, 44, 46, 22, 12, 45, 47, 38, 21, 46, 26, 12, 21,
       18, 14, 20, 45, 13, 48, 43, 45, 49, 49, 37, 15, 39, 38])

x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], 1)

x_train.shape

(48, 12, 1)

y_train.shape

(48,)

Then I predict the product_family's demand with a LSTM model, then I go back to the start, select another product_family, rinse and repeat.

What I wonder is if there is a way to add the product_family feature to the input (and may be product_type and other nominal qualities of products in the future too) of the model, and feed it to the model all at once?

Also, is there a way to bound the demand data with the timestamps to the input so that the model will catch the trend or seasonality of the data/

An appropriate way of adding a feature to a time series forecasting model input

Unprocessed data

Answers (1)

Related Questions