Reputation: 654
I have a dataframe looking like df
:
date way
date_1 A
date_1 B
date_1 A
date_2 A
date_2 A
date_2 A
I want to add a journey
column based on theses conditions : it's a cumulative sum that increment when way
or date
are not the same from a row to the next one and reset the cum sum when date
are changing
I already have :
(df['journey']=df['date'].ne(df['date'].shift())) |(df['way'].ne(df['way'].shift()))).cumsum()
which get the cumulative sum when date
or way
are changing. How can i add the reset when date
are changing ?
the expected result is :
date way journey
date_1 A 1
date_1 B 2
date_1 A 3
date_2 A 1
date_2 A 1
date_2 A 1
Upvotes: 1
Views: 346
Reputation: 863301
Use:
df['journey']= df['way'].ne(df['way'].shift().bfill()).groupby(df['date']).cumsum().add(1)
print (df)
date way journey
0 date_1 A 1
1 date_1 B 2
2 date_1 A 3
3 date_2 A 1
4 date_2 A 1
5 date_2 A 1
Or:
df['journey'] = (df[['date', 'way']].ne(df[['date', 'way']].shift())
.any(axis=1).groupby(df['date']).cumsum())
print (df)
date way journey
0 date_1 A 1
1 date_1 B 2
2 date_1 A 3
3 date_2 A 1
4 date_2 A 1
5 date_2 A 1
Your solution:
df['journey'] = ((df['date'].ne(df['date'].shift()) |(df['way'].ne(df['way'].shift())))
.groupby(df['date']).cumsum())
print (df)
date way journey
0 date_1 A 1
1 date_1 B 2
2 date_1 A 3
3 date_2 A 1
4 date_2 A 1
5 date_2 A 1
Upvotes: 1