How do grouping for a specific Columns in Pandas + applicate stats on this?

Question

I have a dataframe with different column:

'Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', 'Energy Supply per Capita', '% Renewable', ...

Now first I had to add two columns "Continents" as well "PopEst" (estimated Population).

Now I am asked to create a new dataframe with the Continents as index and columns ['size', 'sum', 'mean', 'std'].

I know there is for sure a totally simple solution... ;-(

I tried several things after readin a lot online but seem to find a solution: My idea was to create a new Pandas Dataframe with this

Continents=Top15.groupby('Continent')[['PopEst']]

Unfortunately this delivers me this when I try to print it:

If I do

print(Continents.size())

I get this which looked promising

Continent
Asia             5
Australia        1
Europe           6
North America    2
South America    1
dtype: int64

Unfortunately this looks nice only for .sum and .size. .mean and .std deliver an error as follows:

DataError: No numeric types to aggregate

And my idea to use this (i.e. to add columns to my newly found dataframe)

Continents['size']=Continents.size()

give me this error:

TypeError: 'DataFrameGroupBy' object does not support item assignment

I am sure this is done with 2 - 3 lines of code and would ove to know how this works.

Can anyone point me to the correct solution?

Thanks.

BENY · Accepted Answer

Seems like you want to keep all other columns

Top15.assign(sizeofg=Top15.groupby('Continent')['PopEst'].transform('size')).\
        drop_duplicates('Continent')

EDIT: You need agg

Top15.groupby('Continent')['PopEst'].agg(['sum','mean','count'])

How do grouping for a specific Columns in Pandas + applicate stats on this?

Answers (1)

Related Questions