Pandas groupby cumcount - one cumulative count rather than a cumulative count for each unique value

Question

Let's say I have a df

pd.DataFrame(
    {'name':['pam','pam','bob','bob','pam','bob','pam','bob'],
     'game_id':[0,0,1,1,0,2,1,2]
     }
    )


   name game_id
0   pam 0
1   pam 0
2   bob 1
3   bob 1
4   pam 0
5   bob 2
6   pam 1
7   bob 2

I want to calculate how many games bob and amy have appeared in cumulatively. However, when I use .groupby() and .cumcount()+1, I get something different. I get a cumulative count within each game_id:

df['games'] = df.groupby(['name','game_id']).cumcount()+1

    name game_id games
0   pam 0   1
1   pam 0   2
2   bob 1   1
3   bob 1   2
4   pam 0   3
5   bob 2   1
6   pam 1   1
7   bob 2   2

When what I really want is a one total cumulative count rather than a cumulative count for each unique game_id. Here's an example of my desired output:

    name game_id games
0   pam 0   1
1   pam 0   1
2   bob 1   1
3   bob 1   1
4   pam 0   1
5   bob 2   2
6   pam 1   2
7   bob 2   2

Note, in my actual dataset game_id is a random sequence of numbers.

wwnde · Accepted Answer

Lets try sort df, check consecutive difference, create new group by cumsum and then resort the df

new_df=df.sort_values(by=['name','game_id'])
new_df=new_df.assign(rank=new_df['game_id']!=new_df['game_id'].shift())
new_df=new_df.assign(rank=new_df.groupby('name')['rank'].cumsum()).sort_index()
print(new_df)



 name  game_id  rank
0  pam        0     1
1  pam        0     1
2  bob        1     1
3  bob        1     1
4  pam        0     1
5  bob        2     2
6  pam        1     2
7  bob        2     2

Pandas groupby cumcount - one cumulative count rather than a cumulative count for each unique value

Answers (2)

Related Questions