Reputation: 407
I have a table people(gid int, name varchar) with data:
gid name
1 Bill
2 Will
2 Musk
2 Jack
1 Martin
1 Jorge
now I want to get a result set like below:
gid representative(varchar)
1 Bill, Martin
2 Will, Musk
I mean to list out each group with two of people's names of that group as representative. Here the order doesn't matter, the count matters. How to do it in one query statement, i.e. in PG?
After multiple trying for a long while, I can achieve the goal by this statement:
with indexed as(
select gid,name,row_number() over (partition by gid) as index
from people
),filtered as(
select gid, name
from indexed where index<3
)
select gid,string_agg(name,',')
from filtered
group by gid;
However, the actual table has huge number rows and more complicated schema. It consumes long time to just create the first CTE table "indexed" because of traversing the whole table. Is there better resolution to direct database engine to select out just parts of rows for each group at first, without full traversing?
Upvotes: 0
Views: 66
Reputation: 407
A colleague remainds me of array_sample, and by lucky coincidence, I don't care which rows would win out( here which two names should be selected out). So the solution in my case is:
select gid,array_sample(array_agg(name),2) from people group by gid
I tested it on my actual "big" table with hundred million rows, It run obviously much faster than my old CTE solution.
Upvotes: 0
Reputation: 7846
One option is to use Row_Number() analytic function and STRING_AGG()
-- S a m p l e D a t a :
Create Table People As
( Select 1 as gid, 'Bill' as name Union All
Select 2, 'Will' Union All
Select 2, 'Musk' Union All
Select 2, 'Jack' Union All
Select 1, 'Martin' Union All
Select 1, 'Jorge' )
-- S Q L :
SELECT gid, STRING_AGG(name, ', ') as names
FROM ( Select gid, name, Row_Number() Over(Partition By gid Order By gid) as rn
From People )
WHERE rn <= 2
GROUP BY gid
/*
gid names
--- ---------------
1 Bill, Martin
2 Will, Musk */
See the fiddle here.
Upvotes: 0
Reputation: 4924
Do you have a Table with id as primary key? Then you could give a double subselect a try:
I don't have a clue what the optimizer will make out of it
select id, (select name from (select array_agg(foo.name) name
from (select name from people where people.id = a.id LIMIT 2) as foo) as bar)
from xxx_table a
Upvotes: 0