soggypotato
soggypotato

Reputation: 23

Pandas sum function does not sum all the data

I wanted to make a simple python program with pandas that can help me count how many a person did something cumulatively with data gathered from converted html file to excel file. Here is my data sample:

Name     Date          Minutes
foo      1/12/2000     100
foo      1/12/2000     75
foo      1/12/2020     10
foo      1/13/2020     50
bar      1/13/2020     25
bar      1/14/2020     120

I then tried using groupby(["Name", "Date", "Minutes"]).sum() function, with my expected result is:

Name     Date          Minutes
foo      1/12/2020     185
         1/13/2020     50
bar      1/13/2020     25
         1/14/2020     120

but instead i get:

Name     Date          Minutes
foo      1/12/2020     100
                       75
                       10
         1/13/2020     50
bar      1/13/2020     25
         1/14/2020     120

I tried to google my problem first and i come across this thread but somehow the result is different. I also tried to use agg, and changing the Minutes datatype to int64 but the result is the same. Any help is really appreciated.

Upvotes: 0

Views: 242

Answers (2)

Nick ODell
Nick ODell

Reputation: 25190

If you want to sum the Minutes column, don't include it in the groupby. Including it in the groupby means that columns with different values of Minutes should go into different groups.

Here's how to add up the Minutes for rows with the same Name and Date.

>>> df
  Name       Date  Minutes
0  foo  1/12/2000      100
1  foo  1/12/2000       75
2  foo  1/12/2020       10
3  foo  1/13/2020       50
4  bar  1/13/2020       25
5  bar  1/14/2020      120
>>> df.groupby(["Name", "Date"]).sum()
                Minutes
Name Date              
bar  1/13/2020       25
     1/14/2020      120
foo  1/12/2000      175
     1/12/2020       10
     1/13/2020       50

Upvotes: 0

Morteza Akbari
Morteza Akbari

Reputation: 34

Remove the “Minutes” from the groupby list

Upvotes: 1

Related Questions