F_M
F_M

Reputation: 311

Create new columns based on other's columns value

I'm trying to do some feature engineering for a pandas data frame.

Say I have this:

Data frame 1:
X |  date     | is_holiday
a |  1/4/2018 | 0
a |  1/5/2018 | 0
a |  1/6/2018 | 1
a |  1/7/2018 | 0
a |  1/8/2018 | 0
...
b |  1/1/2018 | 1

I'd like to have an additional indicator for some dates, to indicate if the date is before 1 and 2 days from a holiday, and also 1 and 2 days after.

Data frame 1:
X |  date     | is_holiday | one_day_before_hol | ... | one_day_after_hol
a |  1/4/2018 | 0          | 0                  | ... | 0
a |  1/5/2018 | 0          | 1                  | ... | 0
a |  1/6/2018 | 1          | 0                  | ... | 0
a |  1/7/2018 | 0          | 0                  | ... | 1
a |  1/8/2018 | 0          | 0                  | ... | 0
...
b |  1/1/2018 | 1          | 0                  | ... | 0

Is there any efficient way to do it? I believe I can do it using for statements, but since I'm new to python, I'd like to see if there is an elegant way to do it. Dates might not be adjacent or continuos (i.e. for some of the X columns, a specific date might not be present)

Thank you so much!

Upvotes: 0

Views: 36

Answers (2)

Matt
Matt

Reputation: 96

You could shift:

import pandas as pd
df = pd.DataFrame([1,0,0,1,1,0], columns=['day'])
d.head()

   day
0   1
1   0
2   0
3   1
4   1

df['Once Day Before'] = d['day'].shift(-1)
df['One Day After'] = df['day'].shift(1)
df['Two Days before'] = df['day'].shift(-2)

df.head()

day Holiday   One Day Before    One Day After   Two Days before
0     1            0.0               NaN              0.0
1     0            0.0               1.0              1.0
2     0            1.0               0.0              1.0
3     1            1.0               0.0              0.0
4     1            0.0               1.0              NaN
5     0            NaN               1.0              NaN

This would move the is_holiday up or down and to a new column. You will have to deal with the NaN's though.

Upvotes: 1

Chris
Chris

Reputation: 29742

Use pandas.DataFrame.groupby.shift:

import pandas as pd

g = df.groupby('X')['is_holiday']
df['one_day_before'] = g.shift(-1).fillna(0)
df['two_day_before'] = g.shift(-2).fillna(0)
df['one_day_after'] = g.shift(1).fillna(0)

Output:

   X      date  is_holiday  one_day_before  two_day_before  one_day_after
0  a  1/4/2018           0             0.0             1.0            0.0
1  a  1/5/2018           0             1.0             0.0            0.0
2  a  1/6/2018           1             0.0             0.0            0.0
3  a  1/7/2018           0             0.0             0.0            1.0
4  a  1/8/2018           0             0.0             0.0            0.0
5  b  1/1/2018           1             0.0             0.0            0.0

Upvotes: 1

Related Questions