Reputation: 53
I have a df containing employee , worked_days and sold columns Some employee sold only for first day and after five days sold another
My data look like this
data = {'id':[1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
'days':[1, 3, 3, 8, 8,8, 3, 8, 8, 9, 9, 12],
'sold':[1, 0, 1, 1, 1, 0, 0, 1, 1, 2, 0, 1]}
df = pd.DataFrame(data)
df['days_lag'] = df.groupby('id')['days'].diff().fillna(0).astype('int16')
Gives me this
id days sold days_lag
0 1 1 1 0
1 1 3 0 2
2 1 3 1 0
3 1 8 1 5
4 1 8 1 0
5 1 8 0 0
6 2 3 0 0
7 2 8 1 5
8 2 8 1 0
9 2 9 2 1
10 2 9 0 0
11 2 12 1 3
I want the results to be like below
id days sold days_lag
0 1 1 1 0
1 1 3 0 2
2 1 3 1 2
3 1 8 1 5
4 1 8 1 5
5 1 8 0 5
6 2 3 0 0
7 2 8 1 5
8 2 8 1 5
9 2 9 2 1
10 2 9 0 1
11 2 12 1 3
How can i achieve this ? Thanks
Upvotes: 3
Views: 74
Reputation: 34076
Use Groupby.transform
:
In [92]: df['days_lag'] = df.groupby('id')['days'].diff().fillna(0).astype('int16')
In [96]: df['days_lag'] = df.groupby(['id', 'days'])['days_lag'].transform('max')
In [97]: df
Out[97]:
id days sold days_lag
0 1 1 1 0
1 1 3 0 2
2 1 3 1 2
3 1 8 1 5
4 1 8 1 5
5 1 8 0 5
6 2 3 0 0
7 2 8 1 5
8 2 8 1 5
9 2 9 2 1
10 2 9 0 1
11 2 12 1 3
Upvotes: 2