Revolucion for Monica
Revolucion for Monica

Reputation: 3326

How to group rows, count in one column and do the sum in the other?

I want to group rows of a csv file, count in one column and add in the other.

For example with the following I would like to group the lines on the Commune to make columns of the winner with the count and a column Swing with the sum

Commune Winner Swing longitude latitude turnout
Paris   PAM    1     12.323    12.093   0.3242
Paris   PJD    0     12.323    12.093   0.1233
Paris   PAM    1     12.323    12.093   0.534
Paris   UDF    1     12.323    12.093   0.65434
Madrid  PAM    0     10.435    -3.093   0.3423
Madrid  PAM    1     10.435    -3.093   0.5234
Madrid  PJD    0     10.435    -3.093   0.235

How to group rows, have a column in one column and a sum in the other?

Commune PAM    PJD    UDF    Swing
Paris   3      1      1      3
Madrid  2      1      0      1

So far I tried try :

g = df.groupby('Commune').Winner
pd.concat([g.apply(list), g.count()], axis=1, keys=['members', 'number'])

But it returns:

    members number
Commune     
Paris   [PAM, PJD, PAM, UDF] 4
Madrid  [PAM, PAM, UDF] 3

Upvotes: 2

Views: 412

Answers (3)

jezrael
jezrael

Reputation: 863741

Use crosstab and add new column with DataFrame.join and aggregate sum:

df = pd.crosstab(df['Commune'], df['Winner']).join(df.groupby('Commune')['Swing'].sum())
print (df)
         PAM  PJD  UDF  Swing
Commune                      
Madrid     2    1    0      1
Paris      2    1    1      3

But if need counts of rows:

df1 = pd.crosstab(df['Commune'], df['Winner'], margins=True, margins_name='Total').iloc[:-1]

Or:

df = pd.crosstab(df['Commune'], df['Winner']).assign(Total= lambda x: x.sum(axis=1))

print (df1)
Winner   PAM  PJD  UDF  Total
Commune                      
Madrid     2    1    0      3
Paris      2    1    1      4

EDIT:

If another columns then is possible use aggregation by first if all values per groups and for turnout use some another aggregate function like mean, sum...:

df1 = (df.groupby('Commune')
         .agg({'Swing':'sum', 'longitude':'first','latitude':'first','turnout':'mean'}))
print (df1)
         Swing  longitude  latitude  turnout
Commune                                     
Madrid       1     10.435    -3.093  0.36690
Paris        3     12.323    12.093  0.40896

df = pd.crosstab(df['Commune'], df['Winner']).join(df1)
print (df)
         PAM  PJD  UDF  Swing  longitude  latitude  turnout
Commune                                                    
Madrid     2    1    0      1     10.435    -3.093  0.36690
Paris      2    1    1      3     12.323    12.093  0.40896

If want mean of all columns without Swing is possible create dictionary dynamically:

d = dict.fromkeys(df.columns.difference(['Commune','Winner','Swing']), 'mean')
d['Swing'] = 'sum'
print (d)
{'latitude': 'mean', 'longitude': 'mean', 'turnout': 'mean', 'Swing': 'sum'}

df1 = df.groupby('Commune').agg(d)
print (df1)
         latitude  longitude  turnout  Swing
Commune                                     
Madrid     -3.093     10.435  0.36690      1
Paris      12.093     12.323  0.40896      3

df = pd.crosstab(df['Commune'], df['Winner']).join(df1)
print (df)
         PAM  PJD  UDF  latitude  longitude  turnout  Swing
Commune                                                    
Madrid     2    1    0    -3.093     10.435  0.36690      1
Paris      2    1    1    12.093     12.323  0.40896      3

Upvotes: 3

Freddy
Freddy

Reputation: 1

This is how i did it.

df_a = pd.pivot_table(df, values='Swing', index='Commune', columns='Winner', aggfunc='count', fill_value =0)
df_b = df.groupby('Commune')[['Swing']].sum()
output_df = df_a.join(df_b)

Upvotes: 0

zipa
zipa

Reputation: 27899

This should do it:

pd.pivot_table(df, values='Swing', index='Commune', columns='Winner', aggfunc='count').fillna(0).join(df.groupby('Commune')['Swing'].sum())

#         PAM  PJD  UDF  Swing
#Commune                      
#Madrid   2.0  1.0  0.0      1
#Paris    2.0  1.0  1.0      3

Upvotes: 2

Related Questions