Aggregating similar rows in Pandas

Question

I've got a dataframe that's currently aggregated by zip code, and looks similar to this:

Year   Organization   State   Zip      Number_of_people

2021   A              NJ      07090    5
2020   B              AZ      09876    3
2021   A              NJ      01234    2
2021   C              VA      23456    7
2019   A              NJ      05385    1

I want to aggregate the dataframe and Number_of_People column by state instead, combining identical rows (aside from Number of people) so that the data above instead looks like this:

Year   Organization   State   Number_of_people

2021   A              NJ      7
2020   B              AZ      3
2021   C              VA      7
2019   A              NJ      1

In other words, if rows are identical in all columns EXCEPT Number_of_people, I want to combine the rows and add the number_of_people.

I'm stuck on how to approach this problem after deleting the Zip column -- I think I need to group by Year, Organization, and State but not sure what to do after that.

Pawan Jain · Accepted Answer

A more pythonic version without zip codes

df.groupby(['Year','Organization','State'], as_index=False)['Number_of_people'].sum()

A more pythonic version with zip codes

df.groupby(['Year','Organization','State'], as_index=False).sum()

Aggregating similar rows in Pandas

Answers (2)

Related Questions