Reputation: 21

Update data in table1 where duplicates in table2

I have 2 tables: Users and Results. The usertable contains duplicate data which is reflected in the results table. The user below is created 3 times. I need to update the results table where UserId 2 and 3 to 1 so that all the results can be viewed on this user only.
This is easy if I have only have a few users and a few results for them, but in my case I have 500 duplicated users and 30000 results.

I am using SQL Server Express 2014

I will really appreciate any help with this!

Edit: misstyped column names in resultTable. Im sorry if you got confused by it.

UserTable    
UserId---Fname---LName

1-----Georg-----Smith    
2-----Georg-----Smith    
3-----Georg-----Smith    

ResultsTable

ResultId---UserRefId     
1-----1    
2-----2    
3-----3    
4-----1

I have manage to select duplicates from usertable, but i don't know how to proceed further.

    ;WITH T AS    
(    
SELECT *, COUNT(*) OVER (PARTITION BY Fname + Lname) as Cnt    
  FROM TestDatabase.Users    
  )    
SELECT Id, Fname, Lname    
FROM T    
WHERE Cnt > 2

Upvotes: 1

Answers (3)

Code Different

Reputation: 93191

Your ResultTable has 2 columns with the same UserId name. I changed the second to UserId2 for the query below:

;WITH cte As
    (
        SELECT      R.UserId, R.UserId2,
                    MIN(U.UserId) OVER (PARTITION BY U.FName, U.LName) As OriginalUserId
        FROM        ResultTable     R
        INNER JOIN  UserTable       U       ON R.UserId = U.UserId
    )


UPDATE cte
    SET     UserId2 = OriginalUserId

Upvotes: 2

supergrady

Reputation: 1322

You are on the right track with the cte. The ROW_NUMBER() function can be used to flag duplicate UserIds, then you can join the cte into the from clause of your update statement to find the UserIds you want to replace, and join again to find the UserIds you want to replace them with.

;WITH cteDedup AS(
  SELECT
    UserId
   ,FName
   ,LName
   ,ROW_NUMBER() OVER(PARTITION BY FName, LName ORDER BY UserID ASC) AS row_num
  FROM UserTable
 )

 UPDATE rt
 SET UserId = original.UserId
 FROM ResultsTable rt
 JOIN cteDedup dupe
   ON rt.UserId = dupe.UserId
 JOIN cteDedup original
   ON dupe.FName = original.FName
   AND dupe.LName = original.LName
 WHERE dupe.row_num <> 1
   AND original.row_num = 1

See the SQLFiddle

Upvotes: 0

Alex Kudryashev

Reputation: 9480

A little tricky query looks like this:

;with t as (
select fname+lname name,id,
ROW_NUMBER() over(partition by fname+lname order by id) rn
from #users
)
--for test purpose comment next 2 lines
update #results
set userid=t1.id
--and uncomment the next one
--select t.name,t.id,userid,res,t1.id id1--,(select top 1 id from t t1 where  t1.name=t.name and t.rn=1) id1
from t
inner join #results r on t.id=r.userid
inner join t t1 on t.name=t1.name and t1.rn=1

And then you can delete duplicate users

;with t as (
select name,id,
ROW_NUMBER() over(partition by name order by id) rn
from #users
)
delete t where rn>1

Upvotes: 0

Update data in table1 where duplicates in table2

Answers (3)

Related Questions