Sam
Sam

Reputation: 3659

Pandas: Cumulative count from two columns

winner  loser   winner_matches  loser_matches
Dave    Harry   1               1
Jim     Dave    1               2
Dave    Steve   3               1

I'm trying to build a running count of how many matches a player has participated in based on their name's appearance in either the winner or loser column (ie, Dave above has a running count of 3 since he's been in every match). I'm new to pandas and have tried a few combinations of cumcount and groupby but I'm not sure if I just need to manually loop over the dataset and store all the names myself.

EDIT: to clarify, I need the running totals in the dataframe as shown above and not just a Series printed out later on! Thanks

Upvotes: 0

Views: 594

Answers (2)

jezrael
jezrael

Reputation: 862641

First create MultiIndex Series by DataFrame.stack, then GroupBy.cumcount, for DataFrame add unstack with add_suffix:

print (df)
  winner  loser
0   Dave  Harry
1    Jim   Dave
2   Dave  Steve

s = df.stack()
#if multiple columns in original df
#s = df[['winner','loser']].stack()
df1 = s.groupby(s).cumcount().add(1).unstack().add_suffix('_matches')
print (df1)
   winner_matches  loser_matches
0               1              1
1               1              2
2               3              1

Last append to original DataFrame by join:

df = df.join(df1)
print (df)
  winner  loser  winner_matches  loser_matches
0   Dave  Harry               1              1
1    Jim   Dave               1              2
2   Dave  Steve               3              1

Upvotes: 1

Pyd
Pyd

Reputation: 6159

you need flatten,

 pd.Series(df[['winner','loser']].values.flatten()).value_counts()
 [out]
 Dave     3
 Jim      1
 Harry    1
 Steve    1

Upvotes: 0

Related Questions