zhoriq
zhoriq

Reputation: 125

Indexing jsonb column in Postgresql and IN clause

I'm stuck with jsonb indexes and need help. I have a table with jsonb:

+-------+----------+------------------------------------------------------------+-------+
|id     |measure_id|parameters                                                  |value  |
+-------+----------+------------------------------------------------------------+-------+
|564174 |19        |{"1": 12, "2": 59, "5": 79, "6": 249, "7": 248, "8": 412}   |42.461 |
|564176 |19        |{"1": 12, "2": 59, "5": 80, "6": 249, "7": 248, "8": 412}   |46.198 |
|568244 |19        |{"1": 12, "2": 316, "5": 129, "6": 249, "7": 248, "8": 412} |19.482 |
|568246 |19        |{"1": 12, "2": 316, "5": 130, "6": 249, "7": 248, "8": 412} |20.051 |
|572313 |19        |{"1": 12, "2": 331, "5": 113, "6": 249, "7": 248, "8": 412} |7.098  |
|596434 |19        |{"1": 193, "2": 297, "5": 124, "6": 249, "7": 248, "8": 412}|103.253|
|682354 |22        |{"1": 427, "2": 25, "5": 121, "6": 426, "9": 441, "11": 428}|0.132  |
|686423 |22        |{"1": 427, "2": 60, "5": 72, "6": 426, "9": 443, "11": 428} |0.000  |
|1682439|44        |{"1": 193, "2": 518, "5": 91, "6": 426, "9": 429, "11": 431}|8.321  |
|1686787|44        |{"1": 193, "2": 515, "5": 96, "6": 426, "9": 429, "11": 431}|23.062 |
+-------+----------+------------------------------------------------------------+-------+

It's some statistical data and every row has measure and some parameters set. The number of parameters is different for every measure, so I put them in jsonb column. What I have to do:

  1. select all distinct measures and parameters:

     SELECT DISTINCT 
         measure_id, 
         jsonb_object_keys(parameters) AS parameter_id, 
         parameters -> jsonb_object_keys(parameters) AS parameter_value_id 
     FROM data;
    
  2. select data from this table:

     SELECT d.id,
        d.measure_id,
        CAST(d.attributes as TEXT) as attributes,
        CAST(d.parameters as TEXT) as parameters,
        d.value
     FROM data d
     WHERE d.measure_id=19
       AND (jsonb_extract_path(d.parameters, '1')::bigint in (12))
       AND (jsonb_extract_path(d.parameters, '2')::bigint in (2,59))
       AND (jsonb_extract_path(d.parameters, '5')::bigint in (79, 80, 129, 130, 113))
       AND (jsonb_extract_path(d.parameters, '6')::bigint in (249))
       AND (jsonb_extract_path(d.parameters, '7')::bigint in (248))
       AND (jsonb_extract_path(d.parameters, '8')::bigint in (412))
     ORDER BY d.id;
    

Both queries are running slow. My indexes:

CREATE INDEX idx_data_measure ON data USING btree (measure_id);

CREATE INDEX idx_data_parameters
ON data USING btree (((parameters ->> '1'::text)::bigint), ((parameters ->> '2'::text)::bigint),
         ((parameters ->> '5'::text)::bigint), ((parameters ->> '6'::text)::bigint),
         ((parameters ->> '7'::text)::bigint), ((parameters ->> '8'::text)::bigint),
         ((parameters ->> '9'::text)::bigint), ((parameters ->> '10'::text)::bigint),
         ((parameters ->> '11'::text)::bigint), ((parameters ->> '458'::text)::bigint),
         ((parameters ->> '717'::text)::bigint), ((parameters ->> '718'::text)::bigint),
         ((parameters ->> '719'::text)::bigint), ((parameters ->> '720'::text)::bigint));

I've tried to create one combined index:

CREATE INDEX idx_data_parameters ON data USING btree (measure_id, ((parameters ->> '1'::text)::bigint),...

but this doesn't help.

I've tried EXPLAIN ANALYZE, but honestly I don't understand it :(

EXPLAIN ANALYZE
SELECT DISTINCT
        measure_id,
        jsonb_object_keys(parameters) AS parameter_id,
        parameters -> jsonb_object_keys(parameters) AS parameter_value_id
FROM data;

QUERY PLAN
Unique  (cost=2212571.28..2222400.17 rows=982889 width=72) (actual time=79346.142..84316.123 rows=5050 loops=1)
  ->  Sort  (cost=2212571.28..2215028.50 rows=982889 width=72) (actual time=79346.141..82358.141 rows=5586011 loops=1)
        Sort Key: measure_id, (jsonb_object_keys(parameters)), ((parameters -> (jsonb_object_keys(parameters))))"
        Sort Method: external merge  Disk: 202816kB
        ->  Gather  (cost=1000.00..2034108.05 rows=982889 width=72) (actual time=2467.949..63448.545 rows=5586011 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Result  (cost=0.00..1934819.15 rows=40953700 width=72) (actual time=2432.167..63305.298 rows=1862004 loops=3)
                    ->  ProjectSet  (cost=0.00..1218129.40 rows=40953700 width=156) (actual time=2432.151..62251.992 rows=1862004 loops=3)
                          ->  Parallel Seq Scan on data  (cost=0.00..1010289.37 rows=409537 width=124) (actual time=2432.118..61448.821 rows=327630 loops=3)
Planning Time: 0.417 ms
Execution Time: 84406.575 ms

I feel that I have wrong indexes, but can't create it properly. As I understand GIN is not good idea as I need IN clause for parameters, so I made BTREE. Please help me with it.

EDIT 1: PG Version: PostgreSQL 11.8. Also updated query to fit sample data.

EDIT 2: Query plan for select data SELECT...WHERE...:

Sort  (cost=1030.03..1030.04 rows=1 width=83) (actual time=63.659..63.661 rows=5 loops=1)
  Sort Key: id
  Sort Method: quicksort  Memory: 26kB
  Buffers: shared hit=4881
  ->  Index Scan using idx_data_measure on data d  (cost=0.55..1030.02 rows=1 width=83) (actual time=0.044..63.635 rows=5 loops=1)
        Index Cond: (measure_id = 19)
        Filter: (((jsonb_extract_path(parameters, VARIADIC '{2}'::text[]))::bigint = ANY ('{2,59}'::bigint[])) AND ((jsonb_extract_path(parameters, VARIADIC '{1}'::text[]))::bigint = 12) AND ((jsonb_extract_path(parameters, VARIADIC '{6}'::text[]))::bigint = 249) AND ((jsonb_extract_path(parameters, VARIADIC '{7}'::text[]))::bigint = 248) AND ((jsonb_extract_path(parameters, VARIADIC '{8}'::text[]))::bigint = 412) AND ((jsonb_extract_path(parameters, VARIADIC '{5}'::text[]))::bigint = ANY ('{79,80,129,130,113}'::bigint[])))"
        Rows Removed by Filter: 28733
        Buffers: shared hit=4881
Planning Time: 0.451 ms
Execution Time: 64.973 ms

I see that idx_data_measure is working, and that's all...

Upvotes: 0

Views: 1357

Answers (1)

user330315
user330315

Reputation:

With Postgres 12, I would try something like this:

create index on data (measure_id);
create index on data using gin (parameters);

The operators @> and @? can make use of a GIN index, so maybe this gives better performance.

select *
from data
where measure_id = 19 
  and parameters @? '$."5" ? (@ == 79 || @ == 80 || @ == 129 || @ == 130 || @ == 112)'
  and parameters @> '{"1": 12, "2": 59, "6": 249, "7": 248, "8": 412}'

With Postgres 11 you can't use the @? operator, so maybe splitting the condition up into one that uses the @> operator (to enable use of the GIN index) and the other using the IN condition. That will only be efficient if the condition using the @> operator is highly selective.

select *
from data
where measure_id = 19 
  and parameters @> '{"1": 12, "2": 59, "6": 249, "7": 248, "8": 412}'
  and (parameters ->> '5')::bigint in (79,80,129,130,112);

Note there is no need to use jsonb_extract_path(). I don't know, but maybe using ->> is faster.


The plan you posted shows the query that retrieves all rows from the table and also unnests all elements from the JSON. There is no index that will speed that up.

But maybe doing the unnesting only once, makes this faster:

SELECT DISTINCT
        d.measure_id,
        p.ky AS parameter_id,
        p.value AS parameter_value_id
FROM data d
  cross join jsonb_each(d.parameters) as p(ky, value)

But again: no index will help with that query, as you are retrieving all rows from the table.

Upvotes: 0

Related Questions