Reputation: 135
I want to randomly populate a column in my table with the following values: Available, On Hold, Partial Downpayment, Sold/Unavailable, Token
I have tried using the following command but it populates the entire column with the first value it gets.
update mytable
set sold_status = (select (array['Available', 'On Hold', 'Partial Downpayment', 'Sold/Unavailable', 'Token Recieved'])[floor(random() * 5 + 1)])
I know how to achieve this with a programming language, but I would prefer a PostgreSQL query to achive this.
Upvotes: 3
Views: 2264
Reputation: 908
The accpeted answer is incorrect!
You can see this here (all of the code below is in the fiddle here):
CREATE TABLE tab AS
SELECT
id,
NULL AS rand_x5,
NULL AS sold_status
FROM
generate_series(1, 10000) AS t(id);
and then run:
WITH r AS
(
SELECT j, (RANDOM() * 5)::INT AS x
FROM GENERATE_SERIES(1, 10000) AS j
)
UPDATE tab
SET
rand_x5 = r.x,
sold_status =
CASE r.x -- note different brackets (ranges - inclusive/exclusive boundaries)
WHEN 0 THEN 'Available' -- only happens from [0 - 0.5)
WHEN 1 THEN 'On Hold' -- happens from [0.5 - 1.5) &c.
WHEN 2 THEN 'Partial'
WHEN 3 THEN 'Sold'
WHEN 4 THEN 'Token'
END
FROM r
WHERE tab.id = r.j;
and then run the aggregate query:
SELECT
rand_x5, sold_status, COUNT(rand_x5), COUNT(sold_status)
FROM tab
GROUP by rand_x5, sold_status
ORDER BY rand_x5 DESC NULLS first;
Result:
rand_x5 sold_status count count
5 null 1047 0
4 Token 2062 2062
3 Sold 1995 1995
2 Partial 1912 1912
1 On Hold 2006 2006
0 Available 978 978
SELECT 6 -- << 6 instead of 5
Now, there are two major problems with this:
there are 6 possible values (0 - 5), not 5 in the output of (RANDOM() * 5)::INT
. Since you haven't dealt with the 5 in the CASE
expression, sold_status
becomes NULL
the first value (0) and the last (5) are skewed, in that they'll only have 1/2 of the number of values of the others (1 - 4) if you generate a large number of them - since they will only be generated over a range of 0.5 whereas the others are generated over a range of 1
The correct code for this should be FLOOR((RANDOM() * 5))
- see bottom of fiddle. The result when this is run is:
rand_x5 sold_status count count
0 Available 1938 1938
1 On Hold 2048 2048
2 Partial 1983 1983
3 Sold 2001 2001
4 Token 2030 2030
No NULL
s and all values are (within reason) equally balanced! The CAST
ing to INT
in (RANDOM() * 5)::INT
is the same as what the ROUND() function does and which leads to the errors. FLOOR always rounds down and this issue doesn't occur.
Upvotes: -1
Reputation: 1430
Since the header of this page is "Populate a table column ..."
I am putting here my method to create random values in column from a list:
CREATE TABLE IF NOT EXISTS cars(
id SERIAL PRIMARY KEY,
brand VARCHAR(30)
);
INSERT INTO cars (
brand
)
SELECT
(array['BMW','AUDI','MERCEDES-BENZ'])[floor(random() * 3 + 1)]
FROM generate_series(1, 10);
It will create 10 rows with random values from brand's list.
Upvotes: 1
Reputation: 44202
While PostgreSQL tracks the volatility of simple expressions, it does not track the volatility of sub-selects which do not contain references to the outer query. You could fix this by introducing an artificial reference to the outer query, but in your case you don't need a sub-select in the first place, so just get rid of it.
update mytable
set sold_status = (array[
'Available', 'On Hold', 'Partial Downpayment', 'Sold/Unavailable', 'Token Recieved'
]) [floor(random() * 5 + 1)];
Upvotes: 2
Reputation: 1269933
Postgres thinks it is doing you a favor by running random()
only once -- forgetting that random()
is a volatile function.
You can fix this using case
:
update mytable
set sold_status = (case (random()*5)::int when 0 then 'Available' when 1 then 'On Hold' when 2 then 'Partial Downpayment' when 3 then 'Sold/Unavailable' when 4 then 'Token Recieved' end);
Here is a db<>fiddle.
Upvotes: 1