Hassan Shahbaz
Hassan Shahbaz

Reputation: 606

Select multiple non aggregated columns with group by in postgres

I'm making a query with having multiple non aggregated columns with group by clause but Postgres is throwing an error that I have to add non aggregated columns in group by or use any aggregate function on that column this is the query that I'm trying to run.

select 
    tb1.pipeline as pipeline_id, 
    tb3.pipeline_name as pipeline_name, 
    tb2."name" as integration_name,
    cast(tb1.integration_id as VARCHAR) as integration_id,
    tb1.created_at as created_at, 
    cast(tb1.id as VARCHAR) as batch_id,
    sum(tb1.row_select) as row_select, 
    sum(tb1.row_insert) as row_insert, 
from 
    table1 tb1
join 
    table2 tb2 on tb1.integration_id = tb2.id 
join 
    table3 tb3 on tb1.pipeline = tb3.id 
where 
    tb1.pipeline is not null 
    and tb1.is_super_parent = false 
group by 
    tb1.pipeline

and I found one solution/hack for this error that is I added max function in all other non aggregated columns this solves my problem.

select 
    tb1.pipeline as pipeline_id, 
    max(tb3.pipeline_name) as pipeline_name, 
    max(tb2."name") as integration_name, 
    max(cast(tb1.integration_id as VARCHAR)) as integration_id,
    max(tb1.created_at) as created_at, 
    max(cast(tb1.id as VARCHAR)) as batch_id,
    sum(tb1.row_select) as row_select, 
    sum(tb1.row_insert) as row_insert, 
from 
    table1 tb1
join 
    table2 tb2 on tb1.integration_id = tb2.id 
join 
    table3 tb3 on tb1.pipeline = tb3.id 
where 
    tb1.pipeline is not null 
    and tb1.is_super_parent = false 
group by 
    tb1.pipeline

But I don't want to add max functions when there is no need for that second thing is that applying max to all other column query will be expensive so any other better approach that I can do to solve the above issue, thanks in advance.

Upvotes: 0

Views: 1013

Answers (1)

Belayer
Belayer

Reputation: 14861

Well the first thing you need is to learn to format your queries in so as to get an idea of their flow at a glance. Note due to the extra comma in row_insert, from your query will give a syntax error. With that said; How do you solve your issue?
You cannot avoid the additional aggregates or the expanded group by as long as the exist in the scope same query. You need to separate the aggregation from selection of additional columns. You basically have 2 choices:

  1. Perform the aggregation in a CTE.
    with sums (pipeline_id, row_select, row_insert)  as
         ( select tb1.pipeline
                , sum(tb1.row_select) as row_select
                , sum(tb1.row_insert) as row_insert
           table1 tb1
           where tb1.pipeline is not null 
             and tb1.is_super_parent = false 
           group by tb1.pipeline 
         )
    select s.pipeline_id
         , tbl3.pipeline_name
         , tb2."name" integration_name 
         , s.row_select
         , s.row_insert 
      from sums s
      join table2 tbl2 on (s.pipeline_id = tb2.id)
      join table3 tbl3 on (s.pipeline_id = tb3.id);
  1. Perform the aggregation in a sub-query.
    select s.pipeline_id
         , tbl3.pipeline_name
         , tb2."name" integration_name
         , s.row_select
         , s.row_insert 
      from ( select tb1.pipeline
                , sum(tb1.row_select) as row_select
                , sum(tb1.row_insert) as row_insert
           table1 tb1
           where tb1.pipeline is not null 
             and tb1.is_super_parent = false 
           group by tb1.pipeline 
         ) s
      join table2 tbl2 on (s.pipeline_id = tb2.id)
      join table3 tbl3 on (s.pipeline_id = tb3.id);

NOTE: Not tested as no sample data supplied.

Upvotes: 1

Related Questions