John Downing
John Downing

Reputation: 35

Oracle PL/SQL group by with date field produces confusing results

In a query, I did a GROUP BY on a date field that produced these summary results. The Query was like this:

SELECT  
(CASE 
WHEN std.attribute_1 like '%709%' OR std.attribute_1 like '%999%' THEN 'COMPA' -- COMPA invoices either start with 709 or 999
WHEN h.manual_upload = 'Y' then 'MANUAL_UPLOAD'  
ELSE 'OTHER' END) AS BILLING_SOURCE, std.created_on, COUNT(DISTINCT std.invoice_number) AS COUNT_OF_INVOICES
FROM onebiller.t_std_in_detail_his std
INNER JOIN onebiller.t_std_in_header h
ON h.job_id =  std.job_id
WHERE std.invoice_number IS NOT NULL
GROUP BY (CASE 
WHEN std.attribute_1 like '%709%' OR std.attribute_1 like '%999%' THEN 'COMPA' -- COMPA invoices either start with 709 or 999
WHEN h.manual_upload = 'Y' then 'MANUAL_UPLOAD'  
ELSE 'OTHER' END), std.created_on
ORDER BY std.created_on ASC

Note these results for 3 datetimes on 19-Mar-2021 from the created_on field.

enter image description here

I then used TRUNC(created_on) to try to group all the records from a single day. This was the updated query:

SELECT  
(CASE 
WHEN std.attribute_1 like '%709%' OR std.attribute_1 like '%999%' THEN 'COMPA' -- COMPA invoices either start with 709 or 999
WHEN h.manual_upload = 'Y' then 'MANUAL_UPLOAD'  
ELSE 'OTHER' END) AS BILLING_SOURCE, TRUNC(std.created_on), COUNT(DISTINCT std.invoice_number) AS COUNT_OF_INVOICES
FROM onebiller.t_std_in_detail_his std
INNER JOIN onebiller.t_std_in_header h
ON h.job_id =  std.job_id
WHERE std.invoice_number IS NOT NULL
GROUP BY (CASE 
WHEN std.attribute_1 like '%709%' OR std.attribute_1 like '%999%' THEN 'COMPA' -- COMPA invoices either start with 709 or 999
WHEN h.manual_upload = 'Y' then 'MANUAL_UPLOAD'  
ELSE 'OTHER' END), TRUNC(std.created_on)
ORDER BY TRUNC(std.created_on) ASC

I was expecting a result that would sum the 3 highlighted rows from the first query (2+165+164) instead I received a count of 166 for 19-Mar-2021. Why didn't I get the sum of (2+165+164)?

enter image description here

Upvotes: 0

Views: 56

Answers (1)

Alex Poole
Alex Poole

Reputation: 191245

Because you have the same invoice number repeated at different times on the same day. As a simpler example, say you have:

CREATED_ON INVOICE_NUMBER
2021-03-19 09:00:00 1
2021-03-19 09:00:00 2
2021-03-19 12:00:00 2
2021-03-19 15:00:00 1
2021-03-19 15:00:00 2
2021-03-19 15:00:00 3

That shows 6 rows, but only three distinct invoice numbers - 1, 2 and 3.

A simplified version of your first query gives:

SELECT std.created_on,
  COUNT(DISTINCT std.invoice_number) AS COUNT_OF_INVOICES
FROM std
GROUP BY std.created_on
ORDER BY std.created_on ASC
CREATED_ON COUNT_OF_INVOICES
2021-03-19 09:00:00 2
2021-03-19 12:00:00 1
2021-03-19 15:00:00 3

The sum of those counts currently matches the number of rows in the table, 6. (I haven't included any duplicates at the same time, so the distinct isn't doing anything at the moment, again to keep it simple.) The row for 09:00 counts invoice numbers 1 and 2; the row for 12:00 only counts 2; and the row for 15:00 counts 1, 2 and 3. The counts in all three rows include a count for invoice number 2, the first and third include a count for invoice number 1 - so the same invoice numbers are being counted multiple times.

SELECT TRUNC(std.created_on) as created_on,
  COUNT(DISTINCT std.invoice_number) AS COUNT_OF_INVOICES
FROM std
GROUP BY TRUNC(std.created_on)
ORDER BY TRUNC(std.created_on) ASC
CREATED_ON COUNT_OF_INVOICES
2021-03-19 00:00:00 3

Now the single result is three, because that's how many distinct invoice numbers there are that day - it's now counting 1, 2 and 3 once each, not 2x2, 3x2 and 1x3.

If you didn't have the distinct then the second query would also get 6, because it wouldn't be eliminating the duplicates seen in the first query.

fiddle

Upvotes: 2

Related Questions