Reputation: 3326
I want to group rows of a csv file, count in one column and add in the other.
For example with the following I would like to group the lines on the Commune
to make columns of the winner
with the count and a column Swing
with the sum
Commune Winner Swing longitude latitude turnout
Paris PAM 1 12.323 12.093 0.3242
Paris PJD 0 12.323 12.093 0.1233
Paris PAM 1 12.323 12.093 0.534
Paris UDF 1 12.323 12.093 0.65434
Madrid PAM 0 10.435 -3.093 0.3423
Madrid PAM 1 10.435 -3.093 0.5234
Madrid PJD 0 10.435 -3.093 0.235
How to group rows, have a column in one column and a sum in the other?
Commune PAM PJD UDF Swing
Paris 3 1 1 3
Madrid 2 1 0 1
So far I tried try :
g = df.groupby('Commune').Winner
pd.concat([g.apply(list), g.count()], axis=1, keys=['members', 'number'])
But it returns:
members number
Commune
Paris [PAM, PJD, PAM, UDF] 4
Madrid [PAM, PAM, UDF] 3
Upvotes: 2
Views: 412
Reputation: 863741
Use crosstab
and add new column with DataFrame.join
and aggregate sum
:
df = pd.crosstab(df['Commune'], df['Winner']).join(df.groupby('Commune')['Swing'].sum())
print (df)
PAM PJD UDF Swing
Commune
Madrid 2 1 0 1
Paris 2 1 1 3
But if need counts of rows:
df1 = pd.crosstab(df['Commune'], df['Winner'], margins=True, margins_name='Total').iloc[:-1]
Or:
df = pd.crosstab(df['Commune'], df['Winner']).assign(Total= lambda x: x.sum(axis=1))
print (df1)
Winner PAM PJD UDF Total
Commune
Madrid 2 1 0 3
Paris 2 1 1 4
EDIT:
If another columns then is possible use aggregation by first
if all values per groups and for turnout
use some another aggregate function like mean
, sum
...:
df1 = (df.groupby('Commune')
.agg({'Swing':'sum', 'longitude':'first','latitude':'first','turnout':'mean'}))
print (df1)
Swing longitude latitude turnout
Commune
Madrid 1 10.435 -3.093 0.36690
Paris 3 12.323 12.093 0.40896
df = pd.crosstab(df['Commune'], df['Winner']).join(df1)
print (df)
PAM PJD UDF Swing longitude latitude turnout
Commune
Madrid 2 1 0 1 10.435 -3.093 0.36690
Paris 2 1 1 3 12.323 12.093 0.40896
If want mean
of all columns without Swing
is possible create dictionary dynamically:
d = dict.fromkeys(df.columns.difference(['Commune','Winner','Swing']), 'mean')
d['Swing'] = 'sum'
print (d)
{'latitude': 'mean', 'longitude': 'mean', 'turnout': 'mean', 'Swing': 'sum'}
df1 = df.groupby('Commune').agg(d)
print (df1)
latitude longitude turnout Swing
Commune
Madrid -3.093 10.435 0.36690 1
Paris 12.093 12.323 0.40896 3
df = pd.crosstab(df['Commune'], df['Winner']).join(df1)
print (df)
PAM PJD UDF latitude longitude turnout Swing
Commune
Madrid 2 1 0 -3.093 10.435 0.36690 1
Paris 2 1 1 12.093 12.323 0.40896 3
Upvotes: 3
Reputation: 1
This is how i did it.
df_a = pd.pivot_table(df, values='Swing', index='Commune', columns='Winner', aggfunc='count', fill_value =0)
df_b = df.groupby('Commune')[['Swing']].sum()
output_df = df_a.join(df_b)
Upvotes: 0
Reputation: 27899
This should do it:
pd.pivot_table(df, values='Swing', index='Commune', columns='Winner', aggfunc='count').fillna(0).join(df.groupby('Commune')['Swing'].sum())
# PAM PJD UDF Swing
#Commune
#Madrid 2.0 1.0 0.0 1
#Paris 2.0 1.0 1.0 3
Upvotes: 2