Reputation: 75
I have a simple query that calculates the sum of values for current row and 11 preceding rows. It works fine when the number of rows is greater than 12 , but when it is less than 12 the data gets duplicated to fill in the missing values.
TOTALS TABLE:
ID|Report_Month| Total
1 |2018-08-01 |5
2 |2018-09-01 |25
3 |2018-10-01 |15
Sample Code:
select distinct
ID,
Report_Month,
Total,
sum(Total) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from TOTALS_TABLE;
Expected Output:
ID|Report_Month|Total|Running_Total
1 | 2018-08-01 | 5 | 5
2 | 2018-09-01 | 25 | 30
3 | 2018-10-01 | 15 | 45
Actual Output:
1 | 2018-08-01 | 5 | 5
1 | 2018-08-01 | 5 | 10
1 | 2018-08-01 | 5 | 15
1 | 2018-08-01 | 5 | 20
2 | 2018-09-01 | 25 | 45
2 | 2018-09-01 | 25 | 70
2 | 2018-09-01 | 25 | 95
2 | 2018-09-01 | 25 | 120
3 | 2018-10-01 | 15 | 135
3 | 2018-10-01 | 15 | 150
3 | 2018-10-01 | 15 | 165
3 | 2018-10-01 | 15 | 180
Any help would be greatly appreciated, I feel like I am very close probably missing something.
Upvotes: 0
Views: 662
Reputation: 1269503
The clue is the select distinct
. This should not be necessary. If the underlying table has duplicates, you should fix that. In the meantime, you can try to adjust the query.
I'm not sure what the right fix is. Here are two possibilities.
If entire rows are duplicated:
select ID, Report_Month, Total,
sum(Total) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from (select distinct tt.*
from TOTALS_TABLE tt
) tt;
If the totals table has subtotals on each dy that need to be added up:
select ID, Report_Month,
sum(Total) as month_total,
sum(sum(Total)) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from TOTALS_TABLE tt
group by id, Report_Month;
Upvotes: 1
Reputation: 74605
You seem to want a query that sums over different IDs but you've told the sum to partition on ID, this means your running total will reset every time ID changes (== there is no way the query you've posted can produce the results you posted, even if Postgres is spontaneously inventing rows to provide something to summate). Remove the partition
https://www.db-fiddle.com/#&togetherjs=fw7TIVul3H
I didn't experience the duplicate rows problem and I can't see why adding an analytic would cause it. I think your source table or query genuinely has duplicate rows (I think your use of distinct is attempting to remove them) and the analytic is working fine. Do a
Select * from totals_table
And check your data is ok. If it has duplicate rows you can't remove them with distinct in the manner you have, because distinct takes the result of the running total into account (and it makes each row unique). It would be better to solve the duplication issue at source, than try and distinct them out later but if you're set on doing that you'll have to do your distinct in an inner query and your running total in an outer query
Upvotes: 0