Greg91
Greg91

Reputation: 75

Postgres Rows Between Causing Duplicates

I have a simple query that calculates the sum of values for current row and 11 preceding rows. It works fine when the number of rows is greater than 12 , but when it is less than 12 the data gets duplicated to fill in the missing values.

TOTALS TABLE:

ID|Report_Month| Total
1 |2018-08-01 |5
2 |2018-09-01 |25
3 |2018-10-01  |15

Sample Code:

select distinct 
         ID,
         Report_Month,
         Total,
         sum(Total) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from TOTALS_TABLE;

Expected Output:

ID|Report_Month|Total|Running_Total
1 | 2018-08-01 | 5 | 5
2 | 2018-09-01 | 25 | 30
3 | 2018-10-01 | 15 | 45

Actual Output:

1 | 2018-08-01 | 5 | 5
1 | 2018-08-01 | 5 | 10
1 | 2018-08-01 | 5 | 15
1 | 2018-08-01 | 5 | 20
2 | 2018-09-01 | 25 | 45
2 | 2018-09-01 | 25 | 70
2 | 2018-09-01 | 25 | 95
2 | 2018-09-01 | 25 | 120
3 | 2018-10-01 | 15 | 135
3 | 2018-10-01 | 15 | 150
3 | 2018-10-01 | 15 | 165
3 | 2018-10-01 | 15 | 180

Any help would be greatly appreciated, I feel like I am very close probably missing something.

Upvotes: 0

Views: 662

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269503

The clue is the select distinct. This should not be necessary. If the underlying table has duplicates, you should fix that. In the meantime, you can try to adjust the query.

I'm not sure what the right fix is. Here are two possibilities.

If entire rows are duplicated:

select ID, Report_Month, Total,
       sum(Total) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from (select distinct tt.*
      from TOTALS_TABLE tt
     ) tt;

If the totals table has subtotals on each dy that need to be added up:

select ID, Report_Month,
       sum(Total) as month_total,
       sum(sum(Total)) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from TOTALS_TABLE tt
group by id, Report_Month;

Upvotes: 1

Caius Jard
Caius Jard

Reputation: 74605

You seem to want a query that sums over different IDs but you've told the sum to partition on ID, this means your running total will reset every time ID changes (== there is no way the query you've posted can produce the results you posted, even if Postgres is spontaneously inventing rows to provide something to summate). Remove the partition

https://www.db-fiddle.com/#&togetherjs=fw7TIVul3H

enter image description here

I didn't experience the duplicate rows problem and I can't see why adding an analytic would cause it. I think your source table or query genuinely has duplicate rows (I think your use of distinct is attempting to remove them) and the analytic is working fine. Do a

Select * from totals_table 

And check your data is ok. If it has duplicate rows you can't remove them with distinct in the manner you have, because distinct takes the result of the running total into account (and it makes each row unique). It would be better to solve the duplication issue at source, than try and distinct them out later but if you're set on doing that you'll have to do your distinct in an inner query and your running total in an outer query

Upvotes: 0

Related Questions