user402516
user402516

Reputation: 528

How to generate random data on an associated table in postgres?

I have two tables: users and results. A user has many results.

I need to generate 10 million records in our users table and I was able to do this using the generate_series function in postgres.

Now, I want to generate millions of rows in the results table. But I want there to be a certain distribution of these rows. For example, I want 50% of the users to only have 1 result. I want 40% to have 2 results. And I want 10% to have 5 results.

Is there a way to generate this random data in the results table in postgres?

Upvotes: 1

Views: 364

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269953

Yes:

select u.user_id, gs.result
from (select u.*,
             ntile(10) over (order by rand()) as decile
      from users u
     ) u cross join lateral
     generate_series(1, (case when u.decile <= 5 then 1 when u.decile <= 9 then 2 else 5 end)) gs(result);

This generates the rows. You can fill in with the data you want.

Upvotes: 1

Related Questions