Zareen Ahmad
Zareen Ahmad

Reputation: 15

sum using group by not giving expected result

I need to sum values of one column using group by on another column and override the dataframe with those values

I have tried-

df.groupby('S/T name')['Age group (Years)Total Persons'].sum()

Dataframe to implement sum on -

S/T code        S/T name          city name         population
1                NSW            Greater sydney       1000
1                NSW            rest of nsw          100
1                NSW            rest of nsw          2000
2                Victoria       Geelong              1200
2                Victoria       Melbourne            1300
2                Victoria       Melbourne            1000

Required ouput-

S/T code        S/T name        population
1                NSW                3100
2                Victoria           3500

Upvotes: 0

Views: 192

Answers (2)

oreopot
oreopot

Reputation: 3450

Try the following code:

Solution 1

grouped_df = df.groupby('S/T name')['population'].sum()
print(grouped_df)

The above code will group results by column S/T name and give the sum of population column.

Solution 2

grouped_df1 = df.groupby('S/T name').agg({'S/Tcode':'unique','population': 'sum'})
grouped_df1

Upvotes: 0

cullzie
cullzie

Reputation: 2755

You seem to be summing on the wrong column in your example, switching to population would have got you most of the way:

df.groupby('S/T name')['population'].sum()

Since you want to retain the S/T code column though you can use agg. Calling sum on your population column and mean on your S/T code column:

df.groupby('S/T name').agg({'population': 'sum', 'S/T code': 'mean'})

Output:

S/T name        S/T code  population              
NSW              1        3100
Victoria         2        3500

Upvotes: 1

Related Questions