Reputation: 459
I have a dataset:
I need to compare index of the last month (within a group of id) with two previous months and check if it's the same.
I'm trying code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id':[1,1,1,1,2,2,2,2,3,3,3,3],
'month':[202001,202002,202003,202004,202001,202002,202003,202004,202001,202002,202003,202004,],
'index':[3, 3, 3, 3, 4, 4, 5, 5, 2, 3, 3, 3]})
df['check']=np.where(df.sort_values(['id', 'month'])
.groupby('id')['index']
.apply(lambda x: x.shift(3))
.transform('nunique')>1,1,0)
It returns error: ValueError: transforms cannot produce aggregated results
Without "apply" the code works. What am I doing wrong?
The desired output is something like this:
Thank you for any suggestions
Upvotes: 2
Views: 91
Reputation: 75100
You can try groupby.shift
and subtract from index
then clip
the upper limit to 1:
df['check'] = df['index'].sub(df.groupby("id")['index'].shift(2)).clip(upper=1)
print(df)
id month index check
0 1 202001 3 NaN
1 1 202002 3 NaN
2 1 202003 3 0.0
3 1 202004 3 0.0
4 2 202001 4 NaN
5 2 202002 4 NaN
6 2 202003 5 1.0
7 2 202004 5 1.0
8 3 202001 2 NaN
9 3 202002 3 NaN
10 3 202003 3 1.0
11 3 202004 3 0.0
Upvotes: 2
Reputation: 30050
You can just operate shift
comparison with original in transform
df['check'] = df.sort_values(['id', 'month']).groupby('id')['index'].transform(lambda group: group != group.shift(2)).astype(int)
print(df)
id month index check
0 1 202001 3 1
1 1 202002 3 1
2 1 202003 3 0
3 1 202004 3 0
4 2 202001 4 1
5 2 202002 4 1
6 2 202003 5 1
7 2 202004 5 1
8 3 202001 2 1
9 3 202002 3 1
10 3 202003 3 1
11 3 202004 3 0
Then change the first two rows in check
column to ''
df['check'] = df.sort_values(['id', 'month']).groupby('id')['check'].transform(lambda group: ['', '']+list(group[2:]))
print(df)
id month index check
0 1 202001 3
1 1 202002 3
2 1 202003 3 0
3 1 202004 3 0
4 2 202001 4
5 2 202002 4
6 2 202003 5 1
7 2 202004 5 1
8 3 202001 2
9 3 202002 3
10 3 202003 3 1
11 3 202004 3 0
Upvotes: 1