johanrex
johanrex

Reputation: 419

Replacing values with random values from other table. Oracle

I want to update names in a table with random names from another table. There's real user names in the table Users. There's made up names in the table tmp_users. I'd like to update all the names in Users table with random names from the tmp_users table. The idea is to anonymize real production data with fake customers. There are fewer entries in the tmp_users table so I don't think I can correlate on an id.

The problem I have is that all users gets set to the same name.

Some sample data:

create table users
(
  name varchar2(50)
);

create table tmp_users
(
  name varchar2(50)
);

insert into users values ('Cora');
insert into users values ('Rayna');
insert into users values ('Heidi');
insert into users values ('Gilda');
insert into users values ('Dorothy');
insert into users values ('Elena');
insert into users values ('Providencia');
insert into users values ('Louetta');
insert into users values ('Portia');
insert into users values ('Rodrick');
insert into users values ('Rocco');
insert into users values ('Nelson');
insert into users values ('Derrick');
insert into users values ('Everett');
insert into users values ('Nisha');
insert into users values ('Amy');
insert into users values ('Hyun');
insert into users values ('Brendon');
insert into users values ('Gabriela');
insert into users values ('Melina');

insert into tmp_users values ('Snow White');
insert into tmp_users values ('Cinderella');
insert into tmp_users values ('Aurora');
insert into tmp_users values ('Ariel');
insert into tmp_users values ('Belle');
insert into tmp_users values ('Jasmine');
insert into tmp_users values ('Pocahontas');
insert into tmp_users values ('Mulan');
insert into tmp_users values ('Tinker Bell');
insert into tmp_users values ('Anna');
insert into tmp_users values ('Elsa');

--Wrong, sets all users to the same random name
update users set name = (select name from (select name from tmp_users order by sys_guid()) where rownum = 1);

--Wrong, sets all users to the same random name
update users set name = (select name from (select name from tmp_users order by dbms_random.value) where rownum = 1);

When doing this:

select * from users;

The result I get is something like this, which I don't want.

Cinderella
Cinderella
Cinderella
Cinderella
Cinderella
...

I'd like to assign a random name to each row in the Users table. Not the same name to all rows. I'd like somthing like this:

Mulan
Cinderella
Belle
Elsa
Jasmine
Tinker Bell
...

Any idea how this can be done? I'm using Oracle Database 11g Express Edition 11.2.0.2.0. It would be easy to do with a cursor but I'm trying to figure out how to do it with a set operation.

Update:

I've now tested on two different Oracle versions. The correlated subquery solution doesn't work on Oracle Database 11g Express Edition 11.2.0.2.0. But it does work sometimes on Oracle Database 11g Enterprise Edition 11.2.0.4.0. On one table it works all the time and on another it never works.

Upvotes: 4

Views: 3817

Answers (4)

tbone
tbone

Reputation: 15473

Here's one more approach, which uses rowid and modulus:

MERGE into users_tab u
USING (
    select 
        actual.row_id as actual_rowid,
        actual.rnum actual_rnum, 
        actual.name actual_name, 
        fake.rnum fake_rnum, 
        fake.name fake_name, 
        mod(actual.rnum, fake_count.cnt) modulus
    from
    (
    select rownum rnum, name, rowid as row_id
    from users_tab
    ) actual,
    (
    select rownum-1 rnum, name
    from (select distinct name from tmp_users_tab)
    ) fake,
    (select count(distinct name) cnt from tmp_users_tab) fake_count
    where mod(actual.rnum, fake_count.cnt) = fake.rnum
) x
ON (x.actual_rowid = u.rowid)
WHEN MATCHED THEN UPDATE
    set name = x.fake_name;

Not sure about how this will perform on a very large user table, however. Its not random, but follows a series of fake names. So if you have 10 fake names, records 1->10 in users will be assigned fake names 1->10, and user 11 will start over with fake name #1.

The USING query has extra fields for testing.

Upvotes: 1

Francisco Sitja
Francisco Sitja

Reputation: 1003

Testing with 11.2.0.4 the following works, similar to what @VR46 suggested:

SQL> UPDATE users u
  2     SET name = (SELECT name
  3                   FROM (SELECT NAME,
  4                                row_number() over(ORDER BY dbms_random.value) rn
  5                           FROM tmp_users) tu
  6                    WHERE u.name IS NOT NULL
  7                         AND rn = 1);
20 rows updated

SQL> select * from users;
NAME
--------------------------------------------------
Ariel
Aurora
Belle
Ariel
Anna
Mulan
Aurora
Ariel
Mulan
Tinker Bell
Mulan
Ariel
Aurora
Pocahontas
Pocahontas
Aurora
Snow White
Mulan
Aurora
Anna
20 rows selected

Upvotes: 3

Alex Poole
Alex Poole

Reputation: 191275

You need some correlation, as @VR46 suggested; but sys_guid() doesn't work for this (in 11gR2 anyway; I think the optimiser is only evaluating it once in this scenario for some reason, perhaps); you can use dbms_random.value though:

update users u set name = (
  select name from (
    select name from tmp_users order by dbms_random.value
  )
  where rownum = 1 and u.name is not null
);

NAME                                             
--------------------------------------------------
Jasmine                                           
Tinker Bell                                       
Ariel                                             
Elsa                                              
Elsa                                              
Elsa                                              
Belle                                             
Snow White                                        
...

If you don't want aubquery you could use keep dense rank instead:

update users u set name = (
  select max(name) keep (dense_rank first order by dbms_random.value)
  from tmp_users
  where u.name is not null
);

NAME                                             
--------------------------------------------------
Mulan                                             
Anna                                              
Snow White                                        
Elsa                                              
Tinker Bell                                       
Belle                                             
Belle                                             
Elsa                                              
...

The correlation just has to be true; if you have null values in your user table this will update them to null, and you could use a different condition if that is an issue.

From comments it seems you have the same problem with dbms_random in 11.2.0.2 as I have with sys_guid in 11.2.0.3 and 11.2.0.4. If your users table also has a numeric unique/primary key such as an ID you can use that for the correlation and pass it into the value function, which might make a difference, but I don't have a suitable instance to test against:

update users u set name = (
  select max(name) keep (dense_rank first order by dbms_random.value(0, u.id))
  from tmp_users
);

MOS note 420779.1 includes the line "Including DBMS_RANDOM.VALUE in a subquery may or may not work depending on the optimization and execution code path chosen", which seems to be the problem here.

You could also try variations with merge, e.g. (again assuming there's an ID you can use):

merge into users u
using (
  select u.id, max(tu.name) keep (dense_rank first 
    order by dbms_random.value(0, u.id)) as name
  from users u
  cross join tmp_users tu
  group by u.id
) tu
on (tu.id = u.id)
when matched then update set u.name = tu.name;

The cross join may make this impractical though, depending on the number of rows you actually have in each table.

Upvotes: 1

Pரதீப்
Pரதீப்

Reputation: 93704

I think correlating the sub-query will pull random names from Sub-Query

UPDATE users 
SET    name = (SELECT name 
               FROM   (SELECT name 
                       FROM   tmp_users tu 
                       ORDER  BY Sys_guid()) 
               WHERE  ROWNUM = 1
                 AND users.name <> name ); 

Upvotes: 1

Related Questions