Reputation: 21
I have a db table which holds random personal details created by a data creation tool.
I want to use the values from this table to update fields in another table in order to mask live personal details when we are troubleshooting issues on another server.
Tables would be as follows:
Table 1: tblRandom_Data
Fields: First_Name | Surname
Table 2: tblCustomers
Fields: CustomerID[GUID] | First_Name | Surname | Address_Line1..........
I want to update the First_Name field on all rows in tblCustomers by selecting a random First_Name value from tblRandom_Data.
I have tried the following but this updates all rows with the same random first name selected. All values do not need to be unique, but there needs to be some differences.
Update tblCustomers
SET First_Name = (SELECT TOP 1 First_Name
FROM tblRandom_Data
ORDER BY NEWID()
)
Upvotes: 0
Views: 94
Reputation: 1270421
The problem is that SQL Server "optimizes" the query to only run the subquery once.
I find that the simplest solution is a correlation clause -- even if that clause is meaningless. Assuming that tblCustomers.First_Name
is never NULL
:
UPDATE c
SET First_Name = (SELECT TOP 1 First_Name
FROM tblRandom_Data rd
WHERE rd.First_Name <> c.First_Name
ORDER BY NEWID()
)
FROM tblCustomers c;
Another method is to "cycle" through the values:
UPDATE c
SET First_Name = rd.First_Name
FROM (SELECT c.*,
ROW_NUMBER() OVER (ORDER BY NEWID()) as seqnum
FROM tblCustomers c
) c JOIN
(SELECT rd.*,
ROW_NUMBER() OVER (ORDER BY NEWID()) as seqnum,
COUNT(*) OVER () as cnt
FROM tblRandom_Data rd
) rd
ON (c.seqnum % rd.cnt) = rd.seqnum - 1;
This also has the benefit of "balancing" the names. So if there are more customers than rows in the other table, the names will be assigned about the same number of times.
Upvotes: 1
Reputation: 50173
You can use apply
:
select c.*, rd.first_name
from tblCustomers c cross apply
( select top (1) rd.first_name
from tblRandom_Data rd
where rd.address_line1 <> c.first_name -- you can use any column combination that would always be not matched with each other
order by newid()
) rd;
Upvotes: 1
Reputation: 222582
Here is one way to do it with row_number()
and an updatable cte:
with cte as (
select
c.first_name,
c.surname,
r.first_name r_first_name,
r.surname r_surname,
row_number() over(partition by c.customer_id order by newid()) rn
from tblCustomers c
cross join tblRandom_Data r
)
update cte
set first_name = r_first_name, surname = r_surname
where rn = 1
The cte generates a cartesian produt of both tables and affects a random number to each record within groups having the same customer_id
. The outer query perform the actual update.
Sample data:
select * from tblCustomers;
customer_id | first_name | surname ----------: | :--------- | :------ 1 | null | null 2 | null | null 3 | null | null 4 | null | null
select * from tblRandom_Data;
first_name | surname :--------- | :------ foo | bar baz | zoo
After running the update:
select * from tblCustomers;
customer_id | first_name | surname ----------: | :--------- | :------ 1 | baz | zoo 2 | foo | bar 3 | baz | zoo 4 | foo | bar
Upvotes: 0