ltt
ltt

Reputation: 427

How to select all rows from n random groups in postgres

This question pertains to formulating a query in PostgreSQL Suppose I have the following table:

record   entity   docid   sentencid   
1   A   123   1231   
1   A   123   1232
1   A   100   1001
1   C   100   1002
1   B   121   1212
1   B   123   1234
2   B   102   1021
2   D   111   1111
2   D   102   1022
2   E   101   1011   
3   C   111   1115
3   C   111   1113
3   C   111   1114

Is there a PostgresSQL query that I can use to select all rows for n (or less) random groups of entities for each record in this table? Lets say n is 2. So the query should select all rows for record 3 and all rows for any 2 random entity groups of record 1 and 2. The final result should be ordered by accession, entity, docid, sentenceid.

Here is an example result with n=2:

record   entity   docid   sentencid   
1   A   100   1001
1   A   123   1231   
1   A   123   1232
1   B   121   1212
1   B   123   1234
2   D   102   1022
2   D   111   1111
2   E   101   1011   
3   C   111   1113
3   C   111   1114
3   C   111   1115

assuming that the entities A and B were randomly selected from the set of entities (A,B,C) for record 1 and the entities D and E were randomly selected from the set of entities (B,D,E) for record 2. The n entities should be selected randomly for each record.

I have searched extensively for answers to this question but did not find any working query. Thank you for looking into this!

Upvotes: 1

Views: 316

Answers (1)

FuzzyTree
FuzzyTree

Reputation: 32392

You can use row_number with a random() order to randomly select n entities per record group. Then join this to your main table

select * from Table1 t1
join (
    select * from (
        select record, entity,
            row_number() over (partition by record order by random()) rn
        from Table1
        group by record, entity
    ) t where rn <= 2
) t2 on t1.record = t2.record and t1.entity = t2.entity

Demo

Upvotes: 1

Related Questions