Reputation: 13
I have a very big table (~300 million rows) with the following structure: my_table(id, group, chunk, new_id), where chunk and new_id are set to NULL.
I want to set the rows of each group to a random chunk with distinct new_id in the chunk. Each chunk should be of fixed size of 100.
For example if group A has 1278 rows, they should go into 13 chunks (0-12), 12 chunks with 100 rows s.t. new_id are in range (0-99) and another single chunk with 78 rows s.t. new_id are in range (0-77).
The organization into chunks and within the chunks should be a random permutation where each row in A is assigned with a unique (chunk, new_id) tuple.
I'm successfully doing it using pandas but it takes hours, mostly due to memory and bandwidth limitations.
Is it possible to execute using only a SQL query? I'm using postgres 9.6.
Upvotes: 1
Views: 241
Reputation: 222482
You could do this with row_number()
:
select id, group, rn / 100 chunk, rn % 100 new_id
from (select t.*, row_number() over(order by random()) - 1 rn from mytable t) t
The inner query assigns a random integer number to each record (starting at 0). The outer query does arithmetic to compute the chunk and new id.
If you want an update
query:
update mytable t set chunk = x.rn / 3, new_id = x.rn % 3
from (select id, row_number() over(order by random()) - 1 rn from mytable t) x
where x.id = t.id
Demo on DB Fiddle for a dataset of 20 records with chunks of 3 records .
Before:
id | grp | chunk | new_id -: | --: | ----: | -----: 1 | 1 | nullnull 2 | 2 | nullnull 3 | 3 | nullnull 4 | 4 | nullnull 5 | 5 | nullnull 6 | 6 | nullnull 7 | 7 | nullnull 8 | 8 | nullnull 9 | 9 | nullnull 10 | 10 | nullnull 11 | 11 | nullnull 12 | 12 | nullnull 13 | 13 | nullnull 14 | 14 | nullnull 15 | 15 | nullnull 16 | 16 | nullnull 17 | 17 | nullnull 18 | 18 | nullnull 19 | 19 | nullnull 20 | 20 | nullnull
After:
id | grp | chunk | new_id -: | --: | ----: | -----: 19 | 19 | 0 | 0 11 | 11 | 0 | 1 20 | 20 | 0 | 2 12 | 12 | 1 | 0 14 | 14 | 1 | 1 17 | 17 | 1 | 2 3 | 3 | 2 | 0 8 | 8 | 2 | 1 5 | 5 | 2 | 2 13 | 13 | 3 | 0 10 | 10 | 3 | 1 2 | 2 | 3 | 2 16 | 16 | 4 | 0 18 | 18 | 4 | 1 6 | 6 | 4 | 2 1 | 1 | 5 | 0 15 | 15 | 5 | 1 7 | 7 | 5 | 2 4 | 4 | 6 | 0 9 | 9 | 6 | 1
Upvotes: 1