Reputation: 21

SQL update field from another table by selecting a value based on table row

I have a db table which holds random personal details created by a data creation tool.

I want to use the values from this table to update fields in another table in order to mask live personal details when we are troubleshooting issues on another server.

Tables would be as follows:

Table 1: tblRandom_Data

Fields: First_Name | Surname

Table 2: tblCustomers

Fields: CustomerID[GUID] | First_Name | Surname | Address_Line1..........

I want to update the First_Name field on all rows in tblCustomers by selecting a random First_Name value from tblRandom_Data.

I have tried the following but this updates all rows with the same random first name selected. All values do not need to be unique, but there needs to be some differences.

Update tblCustomers 
        SET First_Name = (SELECT TOP 1 First_Name 
                          FROM tblRandom_Data
                          ORDER BY NEWID()
                         )

Upvotes: 0

Answers (3)

Gordon Linoff

Reputation: 1270421

The problem is that SQL Server "optimizes" the query to only run the subquery once.

I find that the simplest solution is a correlation clause -- even if that clause is meaningless. Assuming that tblCustomers.First_Name is never NULL:

UPDATE c 
    SET First_Name = (SELECT TOP 1 First_Name 
                      FROM tblRandom_Data rd
                      WHERE rd.First_Name <> c.First_Name
                      ORDER BY NEWID()
                     )
FROM tblCustomers c;

Another method is to "cycle" through the values:

UPDATE c
    SET First_Name = rd.First_Name
    FROM (SELECT c.*,
                 ROW_NUMBER() OVER (ORDER BY NEWID()) as seqnum
          FROM tblCustomers c
         ) c JOIN
         (SELECT rd.*,
                 ROW_NUMBER() OVER (ORDER BY NEWID()) as seqnum,
                 COUNT(*) OVER () as cnt
          FROM tblRandom_Data rd
         ) rd
         ON (c.seqnum % rd.cnt) = rd.seqnum - 1;

This also has the benefit of "balancing" the names. So if there are more customers than rows in the other table, the names will be assigned about the same number of times.

Upvotes: 1

Yogesh Sharma

Reputation: 50173

You can use apply :

select c.*, rd.first_name
from tblCustomers c cross apply
     ( select top (1) rd.first_name
       from tblRandom_Data rd
       where rd.address_line1 <> c.first_name -- you can use any column combination that would always be not matched with each other
       order by newid()
     ) rd;

Upvotes: 1

GMB

Reputation: 222582

Here is one way to do it with row_number() and an updatable cte:

with cte as (
    select 
        c.first_name, 
        c.surname, 
        r.first_name r_first_name, 
        r.surname r_surname,
        row_number() over(partition by c.customer_id order by newid()) rn
    from tblCustomers c
    cross join tblRandom_Data r 
)
update cte 
set first_name = r_first_name, surname = r_surname
where rn = 1

The cte generates a cartesian produt of both tables and affects a random number to each record within groups having the same customer_id. The outer query perform the actual update.

Demo on DB Fiddlde

Sample data:

select * from tblCustomers;

customer_id | first_name | surname
----------: | :--------- | :------
          1 | null       | null   
          2 | null       | null   
          3 | null       | null   
          4 | null       | null

select * from tblRandom_Data;

first_name | surname
:--------- | :------
foo        | bar    
baz        | zoo

After running the update:

select * from tblCustomers;

customer_id | first_name | surname
----------: | :--------- | :------
          1 | baz        | zoo    
          2 | foo        | bar    
          3 | baz        | zoo    
          4 | foo        | bar

Upvotes: 0

SQL update field from another table by selecting a value based on table row

Answers (3)

Related Questions