Reputation: 3095
I'm not even sure this is possible in a query, but here goes. I'm cleaning up data coming into our system from a 3rd party. They're creating a not-so-unique unique string for each person in a group that consists of the first 3 letters of the first name and then the dob in yyyymmdd format. On the few that are duplicates, I want to append -1, -2, -3 etc. to that ID so they truly are unique.
JOH19700101 0001 JOHN SMITH 1970-01-01
JOH19700101 0001 JOHNSON SMITH 1970-01-01
JOH19700101 0002 JOHN SMITH 1970-01-01
In the above, the first two IDs (JOH19700101) need to have -1 and -2 appended because they are in the same group (0001). The 3rd can remain the same as he's in a different group (0002).
Selecting the duplicates was easy enough:
SELECT x.id, x.uid, x.group, x.name, x.dob
FROM import x
INNER JOIN (
SELECT uid
FROM import
GROUP BY group, uid
HAVING COUNT(uid) > 1) z ON x.uid = z.uid;
Is there a way to add an update to this so that the first two samples would get new UIDs of JOH19700101-1 and JOH19700101-2 respectively?
Upvotes: 1
Views: 59
Reputation: 603
This will give you unique uid values although it could benefit from one small refinement. Right now the number that is appended keeps incrementing, so your first pair of dupes will get -1 and -2 appended to their uids, then the next pair -3 and -4, and so on.
It should be possible to reset @inc somehow, however as it stands this will ensure that all of your uid values are unique, and that identical uids in different groups will remain unchanged.
SET @inc :=0;
UPDATE
`import` i
JOIN (
SELECT uid, `group`
FROM `import`
GROUP BY `group`, uid
HAVING COUNT(uid) > 1) d ON
i.uid = d.uid AND
i.group = d.group
SET
i.uid = CONCAT(i.uid, '-',@inc:=@inc+1)
Upvotes: 1