Reputation: 1527
I have the following dataframe (dummy data):
score GDP
country
Bangladesh 6 12
Bolivia 4 10
Nigeria 3 9
Pakistan 2 3
Ghana 1 3
India 1 3
Algeria 1 3
And I want to split it into two groups based on GDP and sum the score of each group. On the condition of GDP being less than 9:
sum_score
country
rich 13
poor 5
Upvotes: 1
Views: 216
Reputation: 862591
You can aggregate by boolean mask and last only rename index:
a = df.groupby(df.GDP < 9)['score'].sum().rename({True:'rich', False:'poor'})
print (a)
GDP
poor 13
rich 5
Name: score, dtype: int64
Last for one column DataFrame
add Series.to_frame
:
df = a.to_frame('sum_score')
print (df)
sum_score
GDP
poor 13
rich 5
Upvotes: 1
Reputation: 51335
You can use np.where
to make your rich
and poor
categories, then groupby
that category and get the sum:
df['country_cat'] = np.where(df.GDP < 9, 'poor', 'rich')
df.groupby('country_cat')['score'].sum()
country_cat
poor 5
rich 13
You can also do the same in one step, by not creating the extra column for the category (but IMO the code becomes less readable):
df.groupby(np.where(df.GDP < 9, 'poor', 'rich'))['score'].sum()
Upvotes: 3