Reputation: 6034
I have a report that displays a list of duplicate accounts based on our business rules. This works when one new account is matched with other existing accounts. Where I'm having trouble is when multiple new accounts match the same existing duplicate. Here's an example of how it looks now, grouped by NewId:
NewID MatchedID FirstName LastName AddDate Address PhoneNumber
10 10 Holly Johnson 4/18/2013 123 1St Rd. 123 456 7890
10 2 Hollie Johnson 1/1/1990 123 1St Rd. 123 456 7890
11 11 Holley Johnson 4/17/2013 123 1St Rd. 123-456-7890
11 2 Hollie Johnson 1/1/1990 123 First Rd. 123 456 7890
50 50 William Johnson 4/17/2013 999 2nd St. 222 222 2222
50 3 Bill Jonson 1/2/1990 999 Second St. 222-222-2222
Accounts that have matches are themselves included for comparison.
So, is there a way to group these similar accounts together without duplicates? It should look like this:
GroupID AcctID FirstName LastName AddDate Address PhoneNumber
1 2 Hollie Johnson 1/1/1990 123 First Rd. 123 456 7890
1 10 Holly Johnson 4/18/2013 123 1St Rd. 123 456 7890
1 11 Holley Johnson 4/17/2013 123 1St Rd. 123-456-7890
2 50 William Johnson 4/17/2013 999 2nd St. 222 222 2222
2 3 Bill Jonson 1/2/1990 999 Second St. 222-222-2222
I don't care if the grouping is done in SQL or in SSRS. It would need to reference the two ID columns, because the name, address, and phone number may be different. I also need a new GroupID assigned so that they can be grouped in the report.
Upvotes: 1
Views: 3530
Reputation: 39586
You can use ranking functions to eliminate rows:
with NoDuplicates as
(
select *
, rownum = row_number() over (partition by MatchedID order by NewID)
from Accounts
)
select NewID
, MatchedID
, Name
, AddDate
, Address
, phoneNumber
from NoDuplicates where rownum = 1
Although there's no reason you can't just use GROUP BY
assuming the address information is always duplicated too:
select NewID = min(NewID)
, MatchedID
, Name
, AddDate
, Address
, phoneNumber
from Accounts
group by MatchedID
, Name
, AddDate
, Address
, phoneNumber
Both of these are returning your expected result.
Edit after comment:
You can group related rows with a statement like this:
with NoDuplicates as
(
select *
, rownum = row_number() over (partition by MatchedID order by NewID)
from Accounts
where NewID <> MatchedID
)
select groupID = MatchedID
, Acct = MatchedID
, FirstName
, AddDate
, Address
, phoneNumber
from NoDuplicates where rownum = 1
union all
select groupID = coalesce(am.MatchedID, a.NewID)
, Acct = a.MatchedID
, a.FirstName
, a.AddDate
, a.Address
, a.phoneNumber
from Accounts a
-- join to the corresponding matched account
left join Accounts am on a.MatchedID = am.NewID and am.NewID <> am.MatchedID
where a.NewID = a.MatchedID
order by groupID, Acct
However, this essentially just groups by MatchedID
. If you want numbered groups starting from 1, you can add a DENSE_RANK
clause to the statement:
with NoDuplicates as
(
select *
, rownum = row_number() over (partition by MatchedID order by NewID)
from Accounts
where NewID <> MatchedID
)
, GroupedAcct as
(
select GroupID = MatchedID
, Acct = MatchedID
, FirstName
, AddDate
, Address
, phoneNumber
from NoDuplicates where rownum = 1
union all
select GroupID = coalesce(am.MatchedID, a.NewID)
, Acct = a.MatchedID
, a.FirstName
, a.AddDate
, a.Address
, a.phoneNumber
from Accounts a
-- join to the corresponding matched account
left join Accounts am on a.MatchedID = am.NewID and am.NewID <> am.MatchedID
where a.NewID = a.MatchedID
)
select GroupID = Dense_Rank() over (order by GroupID)
, Acct
, FirstName
, AddDate
, Address
, phoneNumber
from GroupedAcct
order by groupID, Acct
Upvotes: 1