Reputation: 672
I am trying to write a query that will give me the details of all duplicate invoices per vendor.
I cannot use the group by since I need all the details of the invoices. So far this is what I have tried
select
vendid, InvcNbr as InvcNbr, InvcDate, OrigDocAmt, PayDate,
dense_RANK() over (partition by vendid order by invcnbr) RN
from APDoc
where InvcDate >= '10/01/2013'
Not sure how to proceed from here.
vendid InvcNbr InvcDate OrigDoc Paydate RN
AAA 1067458361 10/2/2013 0.00 11/1/2013 8
AAA 1067461099 10/2/2013 16.08 11/1/2013 9
AAA 1067461099 10/2/2013 16.08 11/1/2013 9
AAA 1067461101 10/2/2013 16.08 11/1/2013 10
AAA 1067461101 10/2/2013 16.08 11/1/2013 10
AAA 1067461102 10/2/2013 16.08 11/1/2013 11
AAA 1067461102 10/2/2013 16.08 11/1/2013 11
AAA 1067461103 10/2/2013 92.45 11/1/2013 12
AAA 1067461103 10/2/2013 92.45 11/1/2013 12
Upvotes: 2
Views: 95
Reputation: 4037
Assuming you have a primary key on the table, you can execute this (potentially) very quickly by using the EXISTS clause
select * from APDoc a1 where exists
(
select 1 from APDoc a2
where a1.pk <> a2.pk
and a1.vendid = a2.vendid and a1.invcnbr = a2.invcnbr
)
and InvcDate >= '10/01/2013'
order by vendid , invcnbr
This allows the query optimizer to generate a plan that does not require aggregation which in a very large table with many duplicates will be expensive.
Upvotes: 0
Reputation: 12796
Use Group By
with the Having
clause to identify the duplicates, and then join these results to an outer query to see the details of the duplicates.
Here is an example of how you can do this.
SELECT a.vendid,a.InvcNbr as InvcNbr,a.InvcDate,a.OrigDocAmt,a.PayDate
FROM APDoc a
JOIN (
SELECT vendid, InvcNbr
FROM APDoc
WHERE InvcDate >= '10/01/2013'
GROUP BY vendid,InvcNbr HAVING COUNT(*) > 1
) b ON a.vendid = b.vendid AND a.InvcNbr = b.InvcNbr
Upvotes: 4
Reputation: 1032
This also works and might be easier to understand.
select InvcNbr, COUNT(InvcNbr) as [count]
into #temp1
from #APDoc
group by InvcNbr
select a.vendid, a.InvcNbr, a.InvcDate, a.OrigDoc, a.Paydate, a.RN
from APDoc a, #temp1 b
where a.InvcNbr = b.InvcNbr
and b.[count] = 2
Upvotes: 0
Reputation: 2983
Something like this using Common-Table-Expressions could build the query up as required.
WITH TempCTE AS (SELECT InvcNbr, vendid, ROW_NUMBER() OVER (PARITION
BY vendid, InvcNbr order by invcnbr ) AS RowNum FROM APDoc),
// Find all combinations of InvcNbr/vendid exist
TempCTE2 AS (SELECT InvcNbr, vendid FROM TempCTE WHERE RowNum > 1)
// Get all the combinations of InvcNbr/vendid
SELECT * FROM TempCTE2
INNER JOIN APDoc ON TempCTE2.InvcNbr = APDoc.InvcNbr
AND APDoc.vendid = TempCTE2.vendid
Upvotes: 0