timothyylim
timothyylim

Reputation: 1527

Split a dataframe and sum [pandas]

I have the following dataframe (dummy data):

            score   GDP
country     
Bangladesh  6      12
Bolivia     4      10
Nigeria     3      9
Pakistan    2      3
Ghana       1      3
India       1      3
Algeria     1      3

And I want to split it into two groups based on GDP and sum the score of each group. On the condition of GDP being less than 9:

           sum_score
country     
rich       13      
poor        5     

Upvotes: 1

Views: 216

Answers (2)

jezrael
jezrael

Reputation: 862591

You can aggregate by boolean mask and last only rename index:

a = df.groupby(df.GDP < 9)['score'].sum().rename({True:'rich', False:'poor'})
print (a)
GDP
poor    13
rich     5
Name: score, dtype: int64

Last for one column DataFrame add Series.to_frame:

df = a.to_frame('sum_score')
print (df)
      sum_score
GDP            
poor         13
rich          5

Upvotes: 1

sacuL
sacuL

Reputation: 51335

You can use np.where to make your rich and poor categories, then groupby that category and get the sum:

df['country_cat'] = np.where(df.GDP < 9, 'poor', 'rich')
df.groupby('country_cat')['score'].sum()

country_cat
poor     5
rich    13

You can also do the same in one step, by not creating the extra column for the category (but IMO the code becomes less readable):

df.groupby(np.where(df.GDP < 9, 'poor', 'rich'))['score'].sum()

Upvotes: 3

Related Questions