Reputation: 8796
Suppose I have the following DataFrame:
import pandas as pd
group = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
df = {'population': [100,200,300,400,500,600,700,800],
'city_name': ['Chicago', 'Chicago', 'New York', 'New York', 'Chicago', 'New York', 'Chicago', 'New York'],
}
df = pd.DataFrame(df, index=group)
city_name population
A Chicago 100
A Chicago 200
A New York 300
A New York 400
B Chicago 500
B New York 600
B Chicago 700
B New York 800
I want to take the sum of population (grouped by the index and city_name) and create a new column in the same data frame. For example, I would like a DataFrame that looks like this:
city_name population population_summed
A Chicago 100 300
A Chicago 200 300
A New York 300 700
A New York 400 700
B Chicago 500 1200
B New York 600 1400
B Chicago 700 1200
B New York 800 1400
The reason why I'm having a bit of trouble is that I'm not sure how to use groupby
with both an index and a column.
Upvotes: 1
Views: 147
Reputation: 176730
You can pass both the index and the column to groupby with [df.index, 'city_name']
. Use .transform('sum')
on the groupby object to create the new Series of values:
df['population_summed'] = df.groupby([df.index, 'city_name'])['population'].transform('sum')
This gives:
city_name population population_summed
A Chicago 100 300
A Chicago 200 300
A New York 300 700
A New York 400 700
B Chicago 500 1200
B New York 600 1400
B Chicago 700 1200
B New York 800 1400
Upvotes: 1