Sarath
Sarath

Reputation: 9156

Query performance on join with json_agg in postgres

I have two tables

Table entries


 id    | val1 | val2
-------+------+------+
 ent_1 | xxx  | xxx
 ent_2 | xyy  | yyy
 ent_3 | zxz  | zzz
 ent_4 | zxz  | zzz

Table entries_list


 id  | entry_id | val1 | val2
-----+----------+------+-------
   1 |   ent_1  | xxx  | xxx
   2 |   ent_1  | xyy  | yyy
   3 |   ent_2  | zxz  | zzz
   4 |   ent_2  | zxz  | zzz

entries_list.entry_id is forigen key from entries.id

So I need to find the entries which have a corresponding entry_list references. I don't want the entries which doesn't have any reference in entry_list and the result I am expecting from this is

[{
    id: ent_1,
    entries: [{
        id: 1,
        val1: xxx,
        val2: xxx
    }, {
        id: 1,
        val1: xxx,
        val2: xxx
    }]
}, {
    id: ent_2,
    entries: [{
        id: 3,
        val1: xxx,
        val2: xxx
    }, {
        id: 4,
        val1: xxx,
        val2: xxx
    }]
}]

because of the desired result and structure, I decided to use Json_agg and Json_build_object the query looks like this

SELECT entries.id, 
       Json_agg(Json_build_object('id', list.id, 'val1', list.val2, 'val2', 
       list.val2)) AS sub_list 
FROM   entries 
       INNER JOIN (SELECT id,val1,val2 
                   FROM   entries_list) AS list 
               ON entries.id = list.entry_id
GROUP  BY entries.id 
ORDER  BY entries.id 

But it performs very bad looks like almost 10 sec for 1M records. So what is the better way to change this?

I think of getting the data in the plan way and do grouping it in code outside sql, but how the query should be modified in both approach.?

I have nodejs backend and pg module as a connector.

Upvotes: 3

Views: 4818

Answers (2)

klin
klin

Reputation: 121624

You can use exists instead of regular join:

select 
    entry_id, 
    json_agg(json_build_object('id', id, 'val1', val2, 'val2', val2)) as sub_list 
from entries_list
where exists (
    select 1 
    from entries e 
    where entry_id = e.id)
group by entry_id 
order by entry_id;

You need indexes on entries_list(entry_id) and (obviously, it's probably a primary key) on entries(id).

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269923

How does this version perform?

SELECT e.id, 
       (SELECT Json_agg(Json_build_object('id', el.id, 'val1', el.val2, 'val2', 
       el.val2))
        FROM entries_list el 
        WHERE el.entry_id = e.id
       ) as sub_list 
FROM entries e 
ORDER BY e.id ;

For performance, you want an index on entries_list(entry_id, id, val2). The first key is particularly important.

Upvotes: 4

Related Questions