Groupby using column and index and then sum to create new column

Question

Suppose I have the following DataFrame:

import pandas as pd

group = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
df = {'population': [100,200,300,400,500,600,700,800],
      'city_name': ['Chicago', 'Chicago', 'New York', 'New York', 'Chicago', 'New York', 'Chicago', 'New York'],
     }
df = pd.DataFrame(df, index=group)


    city_name   population
A   Chicago      100
A   Chicago      200
A   New York     300
A   New York     400
B   Chicago      500
B   New York     600
B   Chicago      700
B   New York     800

I want to take the sum of population (grouped by the index and city_name) and create a new column in the same data frame. For example, I would like a DataFrame that looks like this:

    city_name   population   population_summed
A   Chicago      100             300
A   Chicago      200             300
A   New York     300             700
A   New York     400             700
B   Chicago      500             1200
B   New York     600             1400
B   Chicago      700             1200
B   New York     800             1400

The reason why I'm having a bit of trouble is that I'm not sure how to use groupby with both an index and a column.

Alex Riley · Accepted Answer

You can pass both the index and the column to groupby with [df.index, 'city_name']. Use .transform('sum') on the groupby object to create the new Series of values:

df['population_summed'] = df.groupby([df.index, 'city_name'])['population'].transform('sum')

This gives:

  city_name  population  population_summed
A   Chicago         100                300
A   Chicago         200                300
A  New York         300                700
A  New York         400                700
B   Chicago         500               1200
B  New York         600               1400
B   Chicago         700               1200
B  New York         800               1400

Groupby using column and index and then sum to create new column

Answers (1)

Related Questions