Willi
Willi

Reputation: 407

How to aggregate only parts of members of each group in one query statement in PG?

I have a table people(gid int, name varchar) with data:

gid    name                 
1      Bill
2      Will
2      Musk
2      Jack
1      Martin
1      Jorge

now I want to get a result set like below:

  gid   representative(varchar)
   1     Bill, Martin
   2     Will, Musk

I mean to list out each group with two of people's names of that group as representative. Here the order doesn't matter, the count matters. How to do it in one query statement, i.e. in PG?

After multiple trying for a long while, I can achieve the goal by this statement:

with indexed as(
    select gid,name,row_number() over (partition by gid) as index
    from people
),filtered as(
    select gid, name 
    from indexed where index<3
)
select gid,string_agg(name,',')
from filtered
group by gid;

However, the actual table has huge number rows and more complicated schema. It consumes long time to just create the first CTE table "indexed" because of traversing the whole table. Is there better resolution to direct database engine to select out just parts of rows for each group at first, without full traversing?

Upvotes: 0

Views: 66

Answers (3)

Willi
Willi

Reputation: 407

A colleague remainds me of array_sample, and by lucky coincidence, I don't care which rows would win out( here which two names should be selected out). So the solution in my case is:

select gid,array_sample(array_agg(name),2) from people group by gid

I tested it on my actual "big" table with hundred million rows, It run obviously much faster than my old CTE solution.

Upvotes: 0

d r
d r

Reputation: 7846

One option is to use Row_Number() analytic function and STRING_AGG()

--    S a m p l e    D a t a :
Create Table People As
( Select 1 as gid, 'Bill' as name Union All
  Select 2,        'Will' Union All
  Select 2,        'Musk' Union All
  Select 2,        'Jack' Union All
  Select 1,        'Martin' Union All
  Select 1,        'Jorge' )
--    S Q L : 
SELECT gid, STRING_AGG(name, ', ') as names
FROM   ( Select gid, name, Row_Number() Over(Partition By gid Order By gid) as rn
         From   People )
WHERE  rn <= 2
GROUP BY gid
/*
gid names
--- ---------------
  1 Bill, Martin
  2 Will, Musk      */

See the fiddle here.

Upvotes: 0

Turo
Turo

Reputation: 4924

Do you have a Table with id as primary key? Then you could give a double subselect a try:

I don't have a clue what the optimizer will make out of it

select id, (select name from (select array_agg(foo.name) name 
    from (select name from people where people.id = a.id LIMIT 2) as  foo) as bar)
from xxx_table a

Upvotes: 0

Related Questions