Get array of records based on two keys in same table

Question

I have tried this on the following table,

SELECT DISTINCT
  a.main_id,
 array_agg(distinct a.secondary_id )  AS arr                    
FROM table1 a JOIN table1 b ON a.secondary_id = b.secondary_id or a.tertiary_id = b.tertiary_id
group by a.main_id, a.secondary_id , b.tertiary_id

I added the distinct to omit the duplicates But I can not get the whole row as an element in the array which does not even put the rows together to the array based on the below mentioned requirement. I was following this.

Table script:

 Create table table1 
(  
   id bigserial NOT NULL,
   main_id integer NOT NULL,
   secondary_id integer,
   tertiary_id integer,
   data1 text,
   data2 text,
   CONSTRAINT table1_pk PRIMARY KEY (main_id)

)

Data:

INSERT INTO table1(
             main_id, secondary_id, tertiary_id, data1, data2)
    VALUES (1,2,NULL,'data1_1_2_N','data2_1_2_N'),
       (2,2,NULL,'data1_2_2_N','data2_2_2_N'),
       (3,3,5,'data1_3_3_5','data2_3_3_5'),
       (4,3,5,'data1_4_3_5','data2_4_3_5'),
       (5,NULL,1,'data1_5_N_1','data2_5_N_1'),
       (6,NULL,1,'data1_6_N_1','data2_6_N_1'),
       (7,NULL,1,'data1_7_N_1','data2_7_N_1'),
       (8,NULL,2,'data1_8_N_2','data2_8_N_2'),
       (9,NULL,2,'data1_9_N_2','data2_9_N_2'),
       (10,NULL,3,'data1_10_N_3','data2_10_N_3'),
       (11,12,12,'data1_11_12_12','data2_11_12_12'),
       (12,12,11,'data1_12_12_11','data2_12_12_11')

Requirement:

If secondary_id is equal in two or more rows they should be considered as one set, else if tertiary_id is equal they can be considered as one set.

Expected Result:

   1 | {(1,2,NULL,'data1_1_2_N','data2_1_2_N'),(2,2,NULL,'data1_2_2_N','data2_2_2_N')}
   2 | {(3,3,NULL,'data1_3_3_N','data2_3_3_N'),(4,3,NULL,'data1_4_3_N','data2_4_3_N')}
   3 | {(5,NULL,1,'data1_5_N_1','data2_5_N_1'),(6,NULL,1,'data1_6_N_1','data2_6_N_1'),(7,NULL,1,'data1_7_N_1','data2_7_N_1')}
   4 | {(8,NULL,2,'data1_8_N_2','data2_8_N_2'),(9,NULL,2,'data1_9_N_2','data2_9_N_2')}
   5 | {(10,NULL,3,'data1_10_N_3','data2_10_N_3')}
   6 | {(11,12,12,'data1_11_12_12','data2_11_12_12'),(12,12,11,'data1_12_12_11','data2_12_12_11') }

Version "PostgreSQL 9.3.11"

Kamil Gosciminski · Accepted Answer

This should achieve your output. The trick sticks within conditional group by clause to handle cases where secondary_id and tertiary_id are the same for a record which has a matching record on both of those fields.

select array_agg(distinct t1) 
from table1 t1 
join table1 t2 on 
  t1.secondary_id = t2.secondary_id 
  or t1.tertiary_id = t2.tertiary_id 
group by
  case 
    when t1.secondary_id is null or t1.secondary_id is null 
      then concat(t1.secondary_id,'#',t1.tertiary_id) -- #1
    when t1.secondary_id is not null and t1.tertiary_id is not null and t1.secondary_id = t2.secondary_id 
      then t1.secondary_id::TEXT -- #2
    when t1.secondary_id is not null and t1.tertiary_id is not null and t1.tertiary_id = t2.tertiary_id 
      then t1.tertiary_id::TEXT -- #3
    end 
order by 1

Standard case is when any of the fields are null, which stands for #1. We need to group by both columns and we're tricking it by concatenating both values from columns with a # mark and doing a group by this concatenated column.

For #2 and #3 we need to cast the grouping value to type text to make it go through (types returned by CASE statement need to be the same).

Option #2 serves the case when both values are not null and secondary_id matches between those "chosen" rows from selfjoin. Option #3 is analogical, but for tertiary_id match.

Output:

                                                 array_agg
------------------------------------------------------------------------------------------------------------
 {"(1,1,2,,data1_1_2_N,data2_1_2_N)","(2,2,2,,data1_2_2_N,data2_2_2_N)"}
 {"(3,3,3,5,data1_3_3_5,data2_3_3_5)","(4,4,3,5,data1_4_3_5,data2_4_3_5)"}
 {"(5,5,,1,data1_5_N_1,data2_5_N_1)","(6,6,,1,data1_6_N_1,data2_6_N_1)","(7,7,,1,data1_7_N_1,data2_7_N_1)"}
 {"(8,8,,2,data1_8_N_2,data2_8_N_2)","(9,9,,2,data1_9_N_2,data2_9_N_2)"}
 {"(10,10,,3,data1_10_N_3,data2_10_N_3)"}
 {"(11,11,4,4,data1_11_4_4,data2_11_4_4)","(12,12,4,11,data1_12_4_11,data2_12_4_11)"}

If you'd like to get rid of column id from your record, you could use a CTE and select all columns but id and then refer to that CTE in from clause.

Get array of records based on two keys in same table

Answers (1)

Related Questions