RMT
RMT

Reputation: 135

Populate a table column randomly with values from a list in PostgreSQL

I want to randomly populate a column in my table with the following values: Available, On Hold, Partial Downpayment, Sold/Unavailable, Token

I have tried using the following command but it populates the entire column with the first value it gets.

update mytable
set sold_status = (select (array['Available', 'On Hold', 'Partial Downpayment', 'Sold/Unavailable', 'Token Recieved'])[floor(random() * 5 + 1)])

I know how to achieve this with a programming language, but I would prefer a PostgreSQL query to achive this.

Upvotes: 3

Views: 2264

Answers (4)

Vérace
Vérace

Reputation: 908

The accpeted answer is incorrect!

You can see this here (all of the code below is in the fiddle here):

CREATE TABLE tab AS
SELECT
  id,
  NULL AS rand_x5,
  NULL AS sold_status
FROM 
  generate_series(1, 10000) AS t(id);

and then run:

WITH r AS
(
  SELECT j, (RANDOM() * 5)::INT AS x
  FROM GENERATE_SERIES(1, 10000) AS j
)
UPDATE tab
SET 
  rand_x5 = r.x, 
  sold_status =
  CASE r.x                   -- note different brackets (ranges - inclusive/exclusive boundaries)
    WHEN 0 THEN 'Available'  -- only happens from [0 - 0.5)    
    WHEN 1 THEN 'On Hold'    --      happens from [0.5 - 1.5) &c.
    WHEN 2 THEN 'Partial'
    WHEN 3 THEN 'Sold'
    WHEN 4 THEN 'Token'
  END 
FROM r
WHERE tab.id = r.j;

and then run the aggregate query:

SELECT
  rand_x5, sold_status, COUNT(rand_x5), COUNT(sold_status)
FROM tab
GROUP by rand_x5, sold_status
ORDER BY rand_x5 DESC NULLS first;

Result:

rand_x5 sold_status count   count
      5        null  1047       0
      4       Token  2062    2062
      3        Sold  1995    1995
      2     Partial  1912    1912
      1     On Hold  2006    2006
      0   Available   978     978
SELECT 6   --  << 6 instead of 5

Now, there are two major problems with this:

  • there are 6 possible values (0 - 5), not 5 in the output of (RANDOM() * 5)::INT. Since you haven't dealt with the 5 in the CASE expression, sold_status becomes NULL

  • the first value (0) and the last (5) are skewed, in that they'll only have 1/2 of the number of values of the others (1 - 4) if you generate a large number of them - since they will only be generated over a range of 0.5 whereas the others are generated over a range of 1

The correct code for this should be FLOOR((RANDOM() * 5)) - see bottom of fiddle. The result when this is run is:

rand_x5 sold_status count   count
0   Available   1938    1938
1   On Hold 2048    2048
2   Partial 1983    1983
3   Sold    2001    2001
4   Token   2030    2030

No NULLs and all values are (within reason) equally balanced! The CASTing to INT in (RANDOM() * 5)::INT is the same as what the ROUND() function does and which leads to the errors. FLOOR always rounds down and this issue doesn't occur.

Upvotes: -1

sprutex
sprutex

Reputation: 1430

Since the header of this page is "Populate a table column ..." I am putting here my method to create random values in column from a list:

CREATE TABLE IF NOT EXISTS cars(
    id SERIAL PRIMARY KEY,
    brand VARCHAR(30)
);

INSERT INTO cars (
    brand
)
SELECT
    (array['BMW','AUDI','MERCEDES-BENZ'])[floor(random() * 3 + 1)]
FROM generate_series(1, 10);

It will create 10 rows with random values from brand's list.

Upvotes: 1

jjanes
jjanes

Reputation: 44202

While PostgreSQL tracks the volatility of simple expressions, it does not track the volatility of sub-selects which do not contain references to the outer query. You could fix this by introducing an artificial reference to the outer query, but in your case you don't need a sub-select in the first place, so just get rid of it.

update mytable
set sold_status = (array[
        'Available', 'On Hold', 'Partial Downpayment', 'Sold/Unavailable', 'Token Recieved'
    ]) [floor(random() * 5 + 1)];

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269933

Postgres thinks it is doing you a favor by running random() only once -- forgetting that random() is a volatile function.

You can fix this using case:

update mytable
    set sold_status = (case (random()*5)::int when 0 then 'Available' when 1 then 'On Hold' when 2 then 'Partial Downpayment' when 3 then 'Sold/Unavailable' when 4 then 'Token Recieved' end);

Here is a db<>fiddle.

Upvotes: 1

Related Questions