Pi-R
Pi-R

Reputation: 654

how to reset a cumulative sum based on date

I have a dataframe looking like df:

date   way 
date_1 A
date_1 B
date_1 A 
date_2 A 
date_2 A 
date_2 A

I want to add a journey column based on theses conditions : it's a cumulative sum that increment when way or date are not the same from a row to the next one and reset the cum sum when date are changing

I already have :

(df['journey']=df['date'].ne(df['date'].shift())) |(df['way'].ne(df['way'].shift()))).cumsum()

which get the cumulative sum when date or way are changing. How can i add the reset when date are changing ?

the expected result is :

date   way journey
date_1 A    1
date_1 B    2
date_1 A    3
date_2 A    1
date_2 A    1
date_2 A    1

Upvotes: 1

Views: 346

Answers (1)

jezrael
jezrael

Reputation: 863301

Use:

df['journey']= df['way'].ne(df['way'].shift().bfill()).groupby(df['date']).cumsum().add(1)
print (df)
     date way  journey
0  date_1   A        1
1  date_1   B        2
2  date_1   A        3
3  date_2   A        1
4  date_2   A        1
5  date_2   A        1

Or:

df['journey'] = (df[['date', 'way']].ne(df[['date', 'way']].shift())
                   .any(axis=1).groupby(df['date']).cumsum())

print (df)
     date way  journey
0  date_1   A        1
1  date_1   B        2
2  date_1   A        3
3  date_2   A        1
4  date_2   A        1
5  date_2   A        1

Your solution:

df['journey'] = ((df['date'].ne(df['date'].shift()) |(df['way'].ne(df['way'].shift())))
                   .groupby(df['date']).cumsum())

print (df)
     date way  journey
0  date_1   A        1
1  date_1   B        2
2  date_1   A        3
3  date_2   A        1
4  date_2   A        1
5  date_2   A        1

Upvotes: 1

Related Questions