user13949481
user13949481

Reputation:

Weekly sum not equal to the monthly sum pandas

Here's a subset of my data: (for minimum reproducible code)

01-01-20,128921.04
02-01-20,125338.56
03-01-20,100824.66
04-01-20,129203.39
05-01-20,164149.36
06-01-20,120360.65
07-01-20,113249.99
08-01-20,130191.88
09-01-20,101189.75
10-01-20,103243.14
11-01-20,105493.14
12-01-20,140929.83
13-01-20,111561.6
14-01-20,99481.08
15-01-20,121164.38
16-01-20,99812.96
17-01-20,87739.93
18-01-20,107949.75
19-01-20,115452.02
20-01-20,130104.09
21-01-20,110515.02
22-01-20,121331.84
23-01-20,96556.54
24-01-20,80605.72
25-01-20,92569.24
26-01-20,138800.83
27-01-20,92882.83
28-01-20,92318.18
29-01-20,102570.51
30-01-20,83223.74
31-01-20,78875.25
01-02-20,92812.68
02-02-20,100769.52
03-02-20,100436.34
04-02-20,90445.25
05-02-20,111517.47
06-02-20,97293.94
07-02-20,86028.53
08-02-20,98991.03
09-02-20,139218.02
10-02-20,112893.87
11-02-20,88203.69
12-02-20,103741.31
13-02-20,78289.06
14-02-20,87642.83
15-02-20,89486.19
16-02-20,135595
17-02-20,107230.7
18-02-20,93497.96
19-02-20,110915.03
20-02-20,85369.66
21-02-20,89731.12
22-02-20,100323.54
23-02-20,138563.05
24-02-20,94594.24
25-02-20,92561.89
26-02-20,138485.5
27-02-20,90229.11
28-02-20,80458.92
29-02-20,98203.12
01-03-20,133082.48
02-03-20,97570.18
03-03-20,85414.1
04-03-20,92503.9
05-03-20,80030.37
06-03-20,93478.99
07-03-20,103215.04
08-03-20,97862.86
09-03-20,106146.68
10-03-20,86679.19
11-03-20,84391.71
12-03-20,56142.73
13-03-20,73461.15
14-03-20,68306.21
15-03-20,81142.78
16-03-20,47639.16
17-03-20,45149.38
18-03-20,44238.06
19-03-20,101972.48
20-03-20,59555.46
21-03-20,60318.8
22-03-20,83097.88
23-03-20,44619.96
24-03-20,74030.29
25-03-20,122993.73
26-03-20,84086.55
27-03-20,94102.68
28-03-20,96118.46
29-03-20,125752.09
30-03-20,90891.25
31-03-20,101679.05
01-04-20,86656.78
02-04-20,93691.5
03-04-20,88798.26
04-04-20,94957.87
05-04-20,120458.96
06-04-20,93696.39
07-04-20,100141.77
08-04-20,137059.7
09-04-20,100172.98
10-04-20,138442.39
11-04-20,93426.7
12-04-20,133409.21
13-04-20,102913.83
14-04-20,130870.24
15-04-20,125922.79
16-04-20,129950.29
17-04-20,107779.74
18-04-20,115455.62
19-04-20,178068.78
20-04-20,109783.91
21-04-20,146683.79
22-04-20,126564.06
23-04-20,147555.21
24-04-20,118911.06
25-04-20,130732.86
26-04-20,155833.93
27-04-20,126342.17
28-04-20,162196.66
29-04-20,150163.04
30-04-20,183581.71
01-05-20,136151.93
02-05-20,115936.72
03-05-20,174859.86
04-05-20,146856.66
05-05-20,157275.86
06-05-20,138543.82
07-05-20,147154.01
08-05-20,116823.23
09-05-20,125782.39
10-05-20,151169.15
11-05-20,130266.93
12-05-20,149603.86
13-05-20,118424.35
14-05-20,163479.16
15-05-20,128805.56
16-05-20,133935.71
17-05-20,186671.37
18-05-20,111359.55
19-05-20,65546
20-05-20,282153.48
21-05-20,169061.38
22-05-20,135515.54
23-05-20,151844.17
24-05-20,186766.55
25-05-20,189683.25
26-05-20,169501.28
27-05-20,184518.67
28-05-20,153312.42
29-05-20,195488.54
30-05-20,164950.36
31-05-20,203940.3
01-06-20,157677.94
02-06-20,152351.26
03-06-20,186902.06
04-06-20,169101.35
05-06-20,177544.26
06-06-20,187603.88
07-06-20,258896.99
08-06-20,200043.02
09-06-20,236888.89
10-06-20,177424.67
11-06-20,192883.2
12-06-20,160047.5
13-06-20,165664.22
14-06-20,248579.8
15-06-20,170747.19
16-06-20,231164.85
17-06-20,167219.84
18-06-20,156622
19-06-20,173309.74
20-06-20,157201.39
21-06-20,193249.82
22-06-20,134175.03
23-06-20,199480.99
24-06-20,157446.14
25-06-20,150972.53
26-06-20,154339.08
27-06-20,148897.88
28-06-20,182340.04
29-06-20,141234.55
30-06-20,288646.56
01-07-20,282625.02
02-07-20,248820.12
03-07-20,235875.02
04-07-20,240315.18
05-07-20,270446.5
06-07-20,154621.78
07-07-20,138240.25
08-07-20,146545.45
09-07-20,124980.06
10-07-20,
11-07-20,
...

When I read this data in pandas, I tried to sum it based on both weekly basis and monthly basis as follows:

week_wise = data.groupby([(data.index.year), (data.index.week)])\
                    .sum().loc[2020:][[col for col in data.columns if 'Date' not in col]]

month_wise = data.groupby([(data.index.year), (data.index.month)])\
                    .sum().loc[2017:][[col for col in data.columns if 'Date' not in col]]

However month_wise has values: (Rest are zero)

$3,426,610.90
$2,933,528.57
$2,615,673.65
$3,730,222.20
$4,785,382.06
$5,478,656.67

which sum up to: 22970074.05 while week_wise returns values:

$648,437.00
$814,658.40
$743,161.70
$770,483.30
$643,452.70
$723,930.60
$695,851.90
$725,631.10
$727,615.30
$650,075.40
$556,270.50
$441,971.20
$641,703.80
$677,133.70
$796,349.10
$890,961.30
$936,064.80
$1,049,232.00
$983,605.10
$1,011,187.00
$1,102,247.00
$1,261,395.00
$1,290,078.00
$1,381,531.00
$1,249,515.00
$1,127,652.00
$1,707,963.00
$564,387.50

which sum up to: $24,164,107.40

However when I sum within pandas, both the sums are equal to a third value which is as: 2.481254e+07. The sums I mentioned earlier are obtained from pasting the values into Excel and applying SUM formula. So why are the two sums different when I know I am grouping them correctly. More importantly how come none of them matches with the sum calculated externally? Any help is much appreciated!

Upvotes: 3

Views: 144

Answers (1)

Hamza
Hamza

Reputation: 6025

I have recheked your values and pandas is indeed giving you the right sum.

Why monthly sum does not match with pandas sum?

Your data have dates which goes into July yet the sums you mentioned are only till June. So obviously it will be less when you paste these values into an external software!

Why weekly sum does not match with pandas sum?

It does match the pandas sum. I calculated in Google Sheets but you can check in any other way and both the sums are the same.

Upvotes: 1

Related Questions