Reputation: 666
My goal is to generate an id (id trajectory) and a sub id (under trajectory) for each group (u_uuid and p_uuid).
I tried the ngroup function and it didn't work
data = [
{'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'walk', 'dest': 'work'},
{'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'walk', 'dest': 'work'},
{'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'bus', 'dest': 'work'},
{'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'bus', 'dest': 'work'},
{'u_uuid': 110, 'p_uuid': 'aaa', 'mode': 'walk', 'dest': 'work'},
{'u_uuid': 110, 'p_uuid': 'bbb', 'mode': 'walk', 'dest': 'home'},
{'u_uuid': 110, 'p_uuid': 'bbb', 'mode': 'bus', 'dest': 'home'},
{'u_uuid': 110, 'p_uuid': 'bbb', 'mode': 'bus', 'dest': 'home'},
{'u_uuid': 110, 'p_uuid': 'bbb', 'mode': 'walk', 'dest': 'home'}]
df = pd.DataFrame(data)
df['id'] = df.groupby(['u_uuid', 'p_uuid', 'dest']).ngroup()
df['sub_id'] = df.groupby(['u_uuid', 'p_uuid', 'mode']).ngroup()
My dataframe:
What I m looking for:
Upvotes: 1
Views: 66
Reputation: 30920
Use:
s1 = df.groupby(['u_uuid', 'p_uuid', 'dest'],sort=False).ngroup().add(1)
s2 = df.groupby(['u_uuid','p_uuid',
df['mode'].ne(df2['mode'].shift()).cumsum()],sort=False).ngroup()
df['sub_id']=s2.sub(s2.where(s1.ne(s1.shift())).ffill()).add(1).astype(int)
df['id']=s1
print(df)
u_uuid p_uuid mode dest sub_id id
0 110 aaa walk work 1 1
1 110 aaa walk work 1 1
2 110 aaa bus work 2 1
3 110 aaa bus work 2 1
4 110 aaa walk work 3 1
5 110 bbb walk home 1 2
6 110 bbb bus home 2 2
7 110 bbb bus home 2 2
8 110 bbb walk home 3 2
Upvotes: 3