Reputation: 607
I'm trying to select a new random value from a column in another table for each row of a table I'm updating. I'm getting the random value, however I can't get it to change for each row. Any ideas? Here's the code:
UPDATE srs1.courseedition
SET ta_id = teacherassistant.ta_id
FROM srs1.teacherassistant
WHERE (SELECT ta_id FROM srs1.teacherassistant ORDER BY RANDOM()
LIMIT 1) = teacherassistant.ta_id
Upvotes: 2
Views: 1061
Reputation: 1271151
My guess is that Postgres is optimizing out the subquery, because it has no dependencies on the outer query. Have you simply considered using a subquery?
UPDATE srs1.courseedition
SET ta_id = (SELECT ta.ta_id
FROM srs1.teacherassistant ta
ORDER BY RANDOM()
LIMIT 1
);
I don't think this will fix the problem (smart optimizers, alas). But, if you correlate to the outer query, then it should run each time. Perhaps:
UPDATE srs1.courseedition ce
SET ta_id = (SELECT ta.ta_id
FROM srs1.teacherassistant ta
WHERE ce.ta_id IS NULL -- or something like that
ORDER BY RANDOM()
LIMIT 1
);
You can replace the WHERE
clause with something more nonsensical such as WHERE COALESCE(ca.ta_id, '') IS NOT NULL
.
Upvotes: 1
Reputation: 659197
This following solution should be faster by order(s) of magnitude than running a correlated subquery for every row. N random sorts over the whole table vs. 1 random sort. The result is just as random, but we get a perfectly even distribution with this method, whereas independent random picks like in Gordon's solution can (and probably will) assign some rows more often than others. There are different kinds of "random". Actual requirements for "randomness" need to be defined carefully.
Assuming the number of rows in courseedition
is bigger than in teacherassistant
.
To update all rows in courseedition
:
UPDATE srs1.courseedition c1
SET ta_id = t.ta_id
FROM (
SELECT row_number() OVER (ORDER BY random()) - 1 AS rn -- random order
, count(*) OVER () As ct -- total count
, ta_id
FROM srs1.teacherassistant -- smaller table
) t
JOIN (
SELECT row_number() OVER () - 1 AS rn -- arbitrary order
, courseedition_id -- use actual PK of courseedition
FROM srs1.courseedition -- bigger table
) c ON c.rn%t.ct = t.rn -- rownumber of big modulo count of small table
WHERE c.courseedition_id = c1.courseedition_id;
Match the random rownumber of the bigger table modulo the count of the smaller table to the rownumber of the smaller table.
row_number() - 1
to get a 0-based index. Allows using the modulo operator %
more elegantly.
Random sort for one table is enough. The smaller table is cheaper. The second can have any order (arbitrary is cheaper). The assignment after the join is random either way. Perfect randomness would only be impaired indirectly if there are regular patterns in sort order of the bigger table. In this unlikely case, apply ORDER BY random()
to the bigger table to eliminate any such effect.
Upvotes: 0