Reputation: 53
Im trying to sum a column after groupby,
Here is my Data
|Day |SMSsNumber|ShortCode|
|----------|----------|---------|
|2020-08-25|647 |26243 |
|2020-08-25|6,396 |76973 |
|2020-08-25|16,615 |51532 |
|2020-08-25|315 |59230 |
|2020-08-25|4,732 |30210 |
|2020-08-25|209 |32261 |
|2020-08-25|7 |54835 |
I already grouped by Date, but i need to sum the SMSsNumber column.
This is what I getting
|Day |SMSsNumber|Codes|
|----------|----------|-----|
|2020-08-25|647 |26243|
| |6,396 |76973|
| |16,615 |51532|
| |315 |59230|
| |4,732 |30210|
| |209 |32261|
| |7 |54835|
And I need to get the info like this:
|Day |SMSsNumber|Codes|
|----------|----------|-----|
|2020-08-25|28921 |26243|
| | |76973|
| | |51532|
| | |59230|
| | |30210|
| | |32261|
| | |54835|
This is my code
read = pd.read_csv('data.csv')
group_day = read.groupby(['Day','SMSsNumber']).sum()
group_day.to_html('test.html')
print(group_day.head())
:c
Upvotes: 0
Views: 62
Reputation: 1488
group_day = read.groupby(['Day','SMSsNumber']).sum()
In the code above, you're grouping by two columns.
What you want is to group by the first, and sum the second:
group_day = read.groupby(['Day'])['SMSsNumber'].sum()
If you don't specify which column to perform the sum on, you'll get the sum for all columns supporting the operation.
Upvotes: 1
Reputation: 57033
Do not group by SMSsNumber:
read.groupby('Day').sum()
If there are other columns that you want to avoid, select the columns explicitly:
read.groupby('Day')[['SMSsNumber','ShortCode']].sum()
Upvotes: 1