SSuram
SSuram

Reputation: 61

Suggestions for feature engineering

I am having a problem during feature engineering. Looking for some suggestions. Problem statement: I have usage data of multiple customers for 3 days. Some have just 1 day usage some 2 and some 3. Data is related to number of emails sent / contacts added on each day etc.

I am converting this time series data to column-wise ie., number of emails sent by a customer on day1 as one feature, number of emails sent by a customer on day2 as one feature and so on. But problem is that, the usage can be of either increasing order or decreasing order for different customers.

ie., example 1: customer 'A' --> 'number of emails sent on 1st . day' = 100 . ' number of emails sent on 2nd day'=0

example 2: customer 'B' --> 'number of emails sent on 1st . day' = 0 . ' number of emails sent on 2nd day'=100

example 3: customer 'C' --> 'number of emails sent on 1st . day' = 0 . ' number of emails sent on 2nd day'=0

example 4: customer 'D' --> 'number of emails sent on 1st . day' = 100 . ' number of emails sent on 2nd day'=100

In the first two cases => My new feature will have "-100" and "100" as values. Which I guess is good for differentiating. But the problem arises for 3rd and 4th columns when the new feature value will be "0" in both scenarios Can anyone suggest a way to handle this

Upvotes: 2

Views: 137

Answers (1)

Pascal Zoleko
Pascal Zoleko

Reputation: 699

You can extract the following features:

  1. Simple Moving Averages for day 2 and day 3 respectively. This means you now have two extra columns.

  2. Percentage Change from previous day

  3. Percentage Change from day 1 to 3

Upvotes: 1

Related Questions