Reputation: 15
I need to sum values of one column using group by on another column and override the dataframe with those values
I have tried-
df.groupby('S/T name')['Age group (Years)Total Persons'].sum()
Dataframe to implement sum on -
S/T code S/T name city name population
1 NSW Greater sydney 1000
1 NSW rest of nsw 100
1 NSW rest of nsw 2000
2 Victoria Geelong 1200
2 Victoria Melbourne 1300
2 Victoria Melbourne 1000
Required ouput-
S/T code S/T name population
1 NSW 3100
2 Victoria 3500
Upvotes: 0
Views: 192
Reputation: 3450
Try the following code:
Solution 1
grouped_df = df.groupby('S/T name')['population'].sum()
print(grouped_df)
The above code will group results by column S/T name
and give the sum
of population
column.
Solution 2
grouped_df1 = df.groupby('S/T name').agg({'S/Tcode':'unique','population': 'sum'})
grouped_df1
Upvotes: 0
Reputation: 2755
You seem to be summing on the wrong column in your example, switching to population would have got you most of the way:
df.groupby('S/T name')['population'].sum()
Since you want to retain the S/T code column though you can use agg. Calling sum on your population column and mean on your S/T code column:
df.groupby('S/T name').agg({'population': 'sum', 'S/T code': 'mean'})
Output:
S/T name S/T code population
NSW 1 3100
Victoria 2 3500
Upvotes: 1