Indexing jsonb column in Postgresql and IN clause

Question

I'm stuck with jsonb indexes and need help. I have a table with jsonb:

+-------+----------+------------------------------------------------------------+-------+
|id     |measure_id|parameters                                                  |value  |
+-------+----------+------------------------------------------------------------+-------+
|564174 |19        |{"1": 12, "2": 59, "5": 79, "6": 249, "7": 248, "8": 412}   |42.461 |
|564176 |19        |{"1": 12, "2": 59, "5": 80, "6": 249, "7": 248, "8": 412}   |46.198 |
|568244 |19        |{"1": 12, "2": 316, "5": 129, "6": 249, "7": 248, "8": 412} |19.482 |
|568246 |19        |{"1": 12, "2": 316, "5": 130, "6": 249, "7": 248, "8": 412} |20.051 |
|572313 |19        |{"1": 12, "2": 331, "5": 113, "6": 249, "7": 248, "8": 412} |7.098  |
|596434 |19        |{"1": 193, "2": 297, "5": 124, "6": 249, "7": 248, "8": 412}|103.253|
|682354 |22        |{"1": 427, "2": 25, "5": 121, "6": 426, "9": 441, "11": 428}|0.132  |
|686423 |22        |{"1": 427, "2": 60, "5": 72, "6": 426, "9": 443, "11": 428} |0.000  |
|1682439|44        |{"1": 193, "2": 518, "5": 91, "6": 426, "9": 429, "11": 431}|8.321  |
|1686787|44        |{"1": 193, "2": 515, "5": 96, "6": 426, "9": 429, "11": 431}|23.062 |
+-------+----------+------------------------------------------------------------+-------+

It's some statistical data and every row has measure and some parameters set. The number of parameters is different for every measure, so I put them in jsonb column. What I have to do:

select all distinct measures and parameters:

 SELECT DISTINCT 
     measure_id, 
     jsonb_object_keys(parameters) AS parameter_id, 
     parameters -> jsonb_object_keys(parameters) AS parameter_value_id 
 FROM data;

select data from this table:

 SELECT d.id,
    d.measure_id,
    CAST(d.attributes as TEXT) as attributes,
    CAST(d.parameters as TEXT) as parameters,
    d.value
 FROM data d
 WHERE d.measure_id=19
   AND (jsonb_extract_path(d.parameters, '1')::bigint in (12))
   AND (jsonb_extract_path(d.parameters, '2')::bigint in (2,59))
   AND (jsonb_extract_path(d.parameters, '5')::bigint in (79, 80, 129, 130, 113))
   AND (jsonb_extract_path(d.parameters, '6')::bigint in (249))
   AND (jsonb_extract_path(d.parameters, '7')::bigint in (248))
   AND (jsonb_extract_path(d.parameters, '8')::bigint in (412))
 ORDER BY d.id;

Both queries are running slow. My indexes:

CREATE INDEX idx_data_measure ON data USING btree (measure_id);

CREATE INDEX idx_data_parameters
ON data USING btree (((parameters ->> '1'::text)::bigint), ((parameters ->> '2'::text)::bigint),
         ((parameters ->> '5'::text)::bigint), ((parameters ->> '6'::text)::bigint),
         ((parameters ->> '7'::text)::bigint), ((parameters ->> '8'::text)::bigint),
         ((parameters ->> '9'::text)::bigint), ((parameters ->> '10'::text)::bigint),
         ((parameters ->> '11'::text)::bigint), ((parameters ->> '458'::text)::bigint),
         ((parameters ->> '717'::text)::bigint), ((parameters ->> '718'::text)::bigint),
         ((parameters ->> '719'::text)::bigint), ((parameters ->> '720'::text)::bigint));

I've tried to create one combined index:

CREATE INDEX idx_data_parameters ON data USING btree (measure_id, ((parameters ->> '1'::text)::bigint),...

but this doesn't help.

I've tried EXPLAIN ANALYZE, but honestly I don't understand it :(

EXPLAIN ANALYZE
SELECT DISTINCT
        measure_id,
        jsonb_object_keys(parameters) AS parameter_id,
        parameters -> jsonb_object_keys(parameters) AS parameter_value_id
FROM data;

QUERY PLAN
Unique  (cost=2212571.28..2222400.17 rows=982889 width=72) (actual time=79346.142..84316.123 rows=5050 loops=1)
  ->  Sort  (cost=2212571.28..2215028.50 rows=982889 width=72) (actual time=79346.141..82358.141 rows=5586011 loops=1)
        Sort Key: measure_id, (jsonb_object_keys(parameters)), ((parameters -> (jsonb_object_keys(parameters))))"
        Sort Method: external merge  Disk: 202816kB
        ->  Gather  (cost=1000.00..2034108.05 rows=982889 width=72) (actual time=2467.949..63448.545 rows=5586011 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Result  (cost=0.00..1934819.15 rows=40953700 width=72) (actual time=2432.167..63305.298 rows=1862004 loops=3)
                    ->  ProjectSet  (cost=0.00..1218129.40 rows=40953700 width=156) (actual time=2432.151..62251.992 rows=1862004 loops=3)
                          ->  Parallel Seq Scan on data  (cost=0.00..1010289.37 rows=409537 width=124) (actual time=2432.118..61448.821 rows=327630 loops=3)
Planning Time: 0.417 ms
Execution Time: 84406.575 ms

I feel that I have wrong indexes, but can't create it properly. As I understand GIN is not good idea as I need IN clause for parameters, so I made BTREE. Please help me with it.

EDIT 1: PG Version: PostgreSQL 11.8. Also updated query to fit sample data.

EDIT 2: Query plan for select data SELECT...WHERE...:

Sort  (cost=1030.03..1030.04 rows=1 width=83) (actual time=63.659..63.661 rows=5 loops=1)
  Sort Key: id
  Sort Method: quicksort  Memory: 26kB
  Buffers: shared hit=4881
  ->  Index Scan using idx_data_measure on data d  (cost=0.55..1030.02 rows=1 width=83) (actual time=0.044..63.635 rows=5 loops=1)
        Index Cond: (measure_id = 19)
        Filter: (((jsonb_extract_path(parameters, VARIADIC '{2}'::text[]))::bigint = ANY ('{2,59}'::bigint[])) AND ((jsonb_extract_path(parameters, VARIADIC '{1}'::text[]))::bigint = 12) AND ((jsonb_extract_path(parameters, VARIADIC '{6}'::text[]))::bigint = 249) AND ((jsonb_extract_path(parameters, VARIADIC '{7}'::text[]))::bigint = 248) AND ((jsonb_extract_path(parameters, VARIADIC '{8}'::text[]))::bigint = 412) AND ((jsonb_extract_path(parameters, VARIADIC '{5}'::text[]))::bigint = ANY ('{79,80,129,130,113}'::bigint[])))"
        Rows Removed by Filter: 28733
        Buffers: shared hit=4881
Planning Time: 0.451 ms
Execution Time: 64.973 ms

I see that idx_data_measure is working, and that's all...

Indexing jsonb column in Postgresql and IN clause

Answers (1)

Related Questions