Reputation: 3
I have an issue where all records for particular dates are duplicated in a sales table.
I need to be able to filter the table to find which dates were duplicated.
I know how to get duplicates by using the below query fine.
SELECT COUNT(*) AS AmtOfDuplicates,
[DATE], CODE, QTY, TRANSACTION_NO
FROM SALES
WHERE [DATE] >= '2023-01-01'
GROUP BY [DATE], CODE, QTY, TRANSACTION_NO
HAVING COUNT(*) > 1
ORDER BY [DATE], CODE, QTY, TRANSACTION_NO
The issue with this is that some duplicates are valid, ie. a certain transaction can have the same item and qty scanned more than once for a transaction, so thousands more records are returned.
Below is some sample data to explain further.
DATE | CODE | QTY | TRANS_NO |
---|---|---|---|
2/2/2023 | 123 | 1 | 1 |
2/2/2023 | XYZ | 2 | 1 |
13/3/2023 | 123 | 1 | 2 |
13/3/2023 | XYZ | 2 | 2 |
13/3/2023 | 123 | 1 | 2 |
15/4/2023 | 123 | 4 | 3 |
15/4/2023 | XYZ | 3 | 4 |
15/4/2023 | ABC | 1 | 4 |
15/4/2023 | 123 | 4 | 3 |
15/4/2023 | XYZ | 3 | 4 |
15/4/2023 | ABC | 1 | 4 |
As you can see above, the duplicate of item 123 on 13/3/2023 is valid as the item was scanned more than once for that transaction, and we know that date is not duplicate as not all records for that date are twice. However you can see that every record for 15/4/2023 is twice, so that date was duplicate and I just need a way to find that date.
My query above would return:
AmtOfDuplicates | DATE | CODE | QTY | TRANS_NO |
---|---|---|---|---|
2 | 13/3/2023 | 123 | 1 | 2 |
2 | 15/4/2023 | 123 | 4 | 3 |
2 | 15/4/2023 | ABC | 1 | 4 |
2 | 15/4/2023 | XYZ | 3 | 4 |
What I need is just the date that has every record for that date twice:
DATE | DUPLICATE |
---|---|
15/4/2023 | YES |
Upvotes: 0
Views: 65
Reputation: 9287
You can do something like this:
SELECT date
FROM (
SELECT *
, COUNT(*) OVER(PARTITION BY date, code, qty, trans_no) AS cnt
FROM (
VALUES (N'2/2/2023', N'123', 1, 1)
, (N'2/2/2023', N'XYZ', 2, 1)
, (N'13/3/2023', N'123', 1, 2)
, (N'13/3/2023', N'XYZ', 2, 2)
, (N'13/3/2023', N'123', 1, 2)
, (N'15/4/2023', N'123', 4, 3)
, (N'15/4/2023', N'XYZ', 3, 4)
, (N'15/4/2023', N'ABC', 1, 4)
, (N'15/4/2023', N'123', 4, 3)
, (N'15/4/2023', N'XYZ', 3, 4)
, (N'15/4/2023', N'ABC', 1, 4)
) t (DATE,CODE,QTY,TRANS_NO)
) x
GROUP BY date
HAVING max(cnt) = 2
AND min(cnt) = 2
Since you want to check so all rows match your criteria, it's usually simplest to create a counter by use of a window aggregate function in the first step, and then you can use the "real" group by to make sure all rows for a particular date match your condition.
Upvotes: 0
Reputation: 416149
For step one, do a query to show rows that are definitely unique:
SELECT DATE, CODE, QTY, TRANS_NO
FROM Sales
GROUP BY DATE, CODE, QTY, TRANS_NO
HAVING COUNT(*) = 1
This will exclude some valid (non-duplicate) rows, such as the original 123 -- 13/3/2023
example. That's okay, because we really only care about the date. Logically, rows matching any of the dates returned by this query are NOT duplicates.
So for step 2, let's group by these dates and select only that column:
SELECT DATE
FROM (
SELECT DATE, CODE, QTY, TRANS_NO
FROM Sales
GROUP BY DATE, CODE, QTY, TRANS_NO
HAVING COUNT(*) = 1
) d
GROUP BY DATE
Finally, for step 3, do a NOT IN()
, EXCLUSION JOIN
, or NOT EXISTS()
to remove these rows:
Option 1 (Exclusion):
SELECT *
FROM Sales s
LEFT JOIN (
SELECT DATE
FROM (
SELECT DATE, CODE, QTY, TRANS_NO
FROM Sales
GROUP BY DATE, CODE, QTY, TRANS_NO
HAVING COUNT(*) = 1
) d
GROUP BY DATE
) dt ON dt.DATE = s.DATE
WHERE dt.DATE IS NULL
Option 2 NOT EXISTS():
SELECT *
FROM Sales s
WHERE NOT EXISTS (
SELECT 1
FROM (
SELECT DATE
FROM (
SELECT DATE, CODE, QTY, TRANS_NO
FROM Sales
GROUP BY DATE, CODE, QTY, TRANS_NO
HAVING COUNT(*) = 1
) d
GROUP BY DATE
) dt
WHERE dt.DATE = s.DATE
)
And now only duplicates will be shown.
Upvotes: 0