Reputation: 4511
df.head
Populous Continents
Australia 2.331602e+07 Australia
Brazil 2.059153e+08 South America
Canada 3.523986e+07 North America
China 1.367645e+09 Asia
France 6.383735e+07 Europe
Above are the first 5 entries of my dataframe.
I want to group them by Continents, then I want to perform some statistical analysis. I want to create a new dataframe with the Avg, Sum, STD of each Group's populous as well as the count
of countries in each group, as its columns.
new_df =df.groupby('Continents')['Populous'].agg({ 'Avg': np.average, 'Sum':np.sum, 'STD': np.std})
, takes care of three columns, but I don't know how to get count
in there. I tried including 'Size': count
, within the agg
method, but it resulted in an error.
Thank you.
Upvotes: 1
Views: 1004
Reputation: 294298
You might also find this useful:
df.groupby('Continents').Populous.describe().unstack()
Also see this answer if you want more stats.
Upvotes: 2
Reputation: 6663
You can use 'Size': len
or 'Size': 'count'
for this to work. However, as @DSM pointed out, len
does count missing values whereas 'count'
doesn't.
Upvotes: 1