Felipe Carreón
Felipe Carreón

Reputation: 53

Pandas group by and sum

Im trying to sum a column after groupby,

Here is my Data

|Day       |SMSsNumber|ShortCode|
|----------|----------|---------|
|2020-08-25|647       |26243    |
|2020-08-25|6,396     |76973    |
|2020-08-25|16,615    |51532    |
|2020-08-25|315       |59230    |
|2020-08-25|4,732     |30210    |
|2020-08-25|209       |32261    |
|2020-08-25|7         |54835    |

I already grouped by Date, but i need to sum the SMSsNumber column.

This is what I getting

|Day       |SMSsNumber|Codes|
|----------|----------|-----|
|2020-08-25|647       |26243|
|          |6,396     |76973|
|          |16,615    |51532|
|          |315       |59230|
|          |4,732     |30210|
|          |209       |32261|
|          |7         |54835|

And I need to get the info like this:

|Day       |SMSsNumber|Codes|
|----------|----------|-----|
|2020-08-25|28921     |26243|
|          |          |76973|
|          |          |51532|
|          |          |59230|
|          |          |30210|
|          |          |32261|
|          |          |54835|

This is my code

read = pd.read_csv('data.csv')
group_day = read.groupby(['Day','SMSsNumber']).sum()
group_day.to_html('test.html')
print(group_day.head())   

:c

Upvotes: 0

Views: 62

Answers (2)

89f3a1c
89f3a1c

Reputation: 1488

group_day = read.groupby(['Day','SMSsNumber']).sum()

In the code above, you're grouping by two columns.

What you want is to group by the first, and sum the second:

group_day = read.groupby(['Day'])['SMSsNumber'].sum()

If you don't specify which column to perform the sum on, you'll get the sum for all columns supporting the operation.

Upvotes: 1

DYZ
DYZ

Reputation: 57033

Do not group by SMSsNumber:

read.groupby('Day').sum()

If there are other columns that you want to avoid, select the columns explicitly:

read.groupby('Day')[['SMSsNumber','ShortCode']].sum()

Upvotes: 1

Related Questions