Euan Hollidge
Euan Hollidge

Reputation: 607

Select random value for each row

I'm trying to select a new random value from a column in another table for each row of a table I'm updating. I'm getting the random value, however I can't get it to change for each row. Any ideas? Here's the code:

UPDATE srs1.courseedition
SET ta_id = teacherassistant.ta_id
FROM srs1.teacherassistant
WHERE (SELECT ta_id FROM srs1.teacherassistant ORDER BY RANDOM()
       LIMIT 1) = teacherassistant.ta_id

Upvotes: 2

Views: 1061

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271151

My guess is that Postgres is optimizing out the subquery, because it has no dependencies on the outer query. Have you simply considered using a subquery?

UPDATE srs1.courseedition
    SET ta_id = (SELECT ta.ta_id
                 FROM srs1.teacherassistant ta
                 ORDER BY RANDOM()
                 LIMIT 1
                );

I don't think this will fix the problem (smart optimizers, alas). But, if you correlate to the outer query, then it should run each time. Perhaps:

UPDATE srs1.courseedition ce
    SET ta_id = (SELECT ta.ta_id
                 FROM srs1.teacherassistant ta
                 WHERE ce.ta_id IS NULL  -- or something like that
                 ORDER BY RANDOM()
                 LIMIT 1
                );

You can replace the WHERE clause with something more nonsensical such as WHERE COALESCE(ca.ta_id, '') IS NOT NULL.

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 659197

This following solution should be faster by order(s) of magnitude than running a correlated subquery for every row. N random sorts over the whole table vs. 1 random sort. The result is just as random, but we get a perfectly even distribution with this method, whereas independent random picks like in Gordon's solution can (and probably will) assign some rows more often than others. There are different kinds of "random". Actual requirements for "randomness" need to be defined carefully.

Assuming the number of rows in courseedition is bigger than in teacherassistant.

To update all rows in courseedition:

UPDATE srs1.courseedition c1
SET    ta_id = t.ta_id
FROM  (
   SELECT row_number() OVER (ORDER BY random()) - 1 AS rn  -- random order
        , count(*) OVER () As ct                           -- total count
        , ta_id
   FROM   srs1.teacherassistant           -- smaller table
   ) t
JOIN (
   SELECT row_number() OVER () - 1 AS rn  -- arbitrary order
        , courseedition_id                -- use actual PK of courseedition
   FROM   srs1.courseedition              -- bigger table
   ) c ON c.rn%t.ct = t.rn                -- rownumber of big modulo count of small table
WHERE  c.courseedition_id = c1.courseedition_id;

Notes

Match the random rownumber of the bigger table modulo the count of the smaller table to the rownumber of the smaller table.

row_number() - 1 to get a 0-based index. Allows using the modulo operator % more elegantly.

Random sort for one table is enough. The smaller table is cheaper. The second can have any order (arbitrary is cheaper). The assignment after the join is random either way. Perfect randomness would only be impaired indirectly if there are regular patterns in sort order of the bigger table. In this unlikely case, apply ORDER BY random() to the bigger table to eliminate any such effect.

Upvotes: 0

Related Questions