Reputation: 125
I'm stuck with jsonb indexes and need help. I have a table with jsonb:
+-------+----------+------------------------------------------------------------+-------+
|id |measure_id|parameters |value |
+-------+----------+------------------------------------------------------------+-------+
|564174 |19 |{"1": 12, "2": 59, "5": 79, "6": 249, "7": 248, "8": 412} |42.461 |
|564176 |19 |{"1": 12, "2": 59, "5": 80, "6": 249, "7": 248, "8": 412} |46.198 |
|568244 |19 |{"1": 12, "2": 316, "5": 129, "6": 249, "7": 248, "8": 412} |19.482 |
|568246 |19 |{"1": 12, "2": 316, "5": 130, "6": 249, "7": 248, "8": 412} |20.051 |
|572313 |19 |{"1": 12, "2": 331, "5": 113, "6": 249, "7": 248, "8": 412} |7.098 |
|596434 |19 |{"1": 193, "2": 297, "5": 124, "6": 249, "7": 248, "8": 412}|103.253|
|682354 |22 |{"1": 427, "2": 25, "5": 121, "6": 426, "9": 441, "11": 428}|0.132 |
|686423 |22 |{"1": 427, "2": 60, "5": 72, "6": 426, "9": 443, "11": 428} |0.000 |
|1682439|44 |{"1": 193, "2": 518, "5": 91, "6": 426, "9": 429, "11": 431}|8.321 |
|1686787|44 |{"1": 193, "2": 515, "5": 96, "6": 426, "9": 429, "11": 431}|23.062 |
+-------+----------+------------------------------------------------------------+-------+
It's some statistical data and every row has measure and some parameters set. The number of parameters is different for every measure, so I put them in jsonb column. What I have to do:
select all distinct measures and parameters:
SELECT DISTINCT
measure_id,
jsonb_object_keys(parameters) AS parameter_id,
parameters -> jsonb_object_keys(parameters) AS parameter_value_id
FROM data;
select data from this table:
SELECT d.id,
d.measure_id,
CAST(d.attributes as TEXT) as attributes,
CAST(d.parameters as TEXT) as parameters,
d.value
FROM data d
WHERE d.measure_id=19
AND (jsonb_extract_path(d.parameters, '1')::bigint in (12))
AND (jsonb_extract_path(d.parameters, '2')::bigint in (2,59))
AND (jsonb_extract_path(d.parameters, '5')::bigint in (79, 80, 129, 130, 113))
AND (jsonb_extract_path(d.parameters, '6')::bigint in (249))
AND (jsonb_extract_path(d.parameters, '7')::bigint in (248))
AND (jsonb_extract_path(d.parameters, '8')::bigint in (412))
ORDER BY d.id;
Both queries are running slow. My indexes:
CREATE INDEX idx_data_measure ON data USING btree (measure_id);
CREATE INDEX idx_data_parameters
ON data USING btree (((parameters ->> '1'::text)::bigint), ((parameters ->> '2'::text)::bigint),
((parameters ->> '5'::text)::bigint), ((parameters ->> '6'::text)::bigint),
((parameters ->> '7'::text)::bigint), ((parameters ->> '8'::text)::bigint),
((parameters ->> '9'::text)::bigint), ((parameters ->> '10'::text)::bigint),
((parameters ->> '11'::text)::bigint), ((parameters ->> '458'::text)::bigint),
((parameters ->> '717'::text)::bigint), ((parameters ->> '718'::text)::bigint),
((parameters ->> '719'::text)::bigint), ((parameters ->> '720'::text)::bigint));
I've tried to create one combined index:
CREATE INDEX idx_data_parameters ON data USING btree (measure_id, ((parameters ->> '1'::text)::bigint),...
but this doesn't help.
I've tried EXPLAIN ANALYZE, but honestly I don't understand it :(
EXPLAIN ANALYZE
SELECT DISTINCT
measure_id,
jsonb_object_keys(parameters) AS parameter_id,
parameters -> jsonb_object_keys(parameters) AS parameter_value_id
FROM data;
QUERY PLAN
Unique (cost=2212571.28..2222400.17 rows=982889 width=72) (actual time=79346.142..84316.123 rows=5050 loops=1)
-> Sort (cost=2212571.28..2215028.50 rows=982889 width=72) (actual time=79346.141..82358.141 rows=5586011 loops=1)
Sort Key: measure_id, (jsonb_object_keys(parameters)), ((parameters -> (jsonb_object_keys(parameters))))"
Sort Method: external merge Disk: 202816kB
-> Gather (cost=1000.00..2034108.05 rows=982889 width=72) (actual time=2467.949..63448.545 rows=5586011 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Result (cost=0.00..1934819.15 rows=40953700 width=72) (actual time=2432.167..63305.298 rows=1862004 loops=3)
-> ProjectSet (cost=0.00..1218129.40 rows=40953700 width=156) (actual time=2432.151..62251.992 rows=1862004 loops=3)
-> Parallel Seq Scan on data (cost=0.00..1010289.37 rows=409537 width=124) (actual time=2432.118..61448.821 rows=327630 loops=3)
Planning Time: 0.417 ms
Execution Time: 84406.575 ms
I feel that I have wrong indexes, but can't create it properly. As I understand GIN is not good idea as I need IN clause for parameters, so I made BTREE. Please help me with it.
EDIT 1: PG Version: PostgreSQL 11.8. Also updated query to fit sample data.
EDIT 2: Query plan for select data SELECT...WHERE...
:
Sort (cost=1030.03..1030.04 rows=1 width=83) (actual time=63.659..63.661 rows=5 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=4881
-> Index Scan using idx_data_measure on data d (cost=0.55..1030.02 rows=1 width=83) (actual time=0.044..63.635 rows=5 loops=1)
Index Cond: (measure_id = 19)
Filter: (((jsonb_extract_path(parameters, VARIADIC '{2}'::text[]))::bigint = ANY ('{2,59}'::bigint[])) AND ((jsonb_extract_path(parameters, VARIADIC '{1}'::text[]))::bigint = 12) AND ((jsonb_extract_path(parameters, VARIADIC '{6}'::text[]))::bigint = 249) AND ((jsonb_extract_path(parameters, VARIADIC '{7}'::text[]))::bigint = 248) AND ((jsonb_extract_path(parameters, VARIADIC '{8}'::text[]))::bigint = 412) AND ((jsonb_extract_path(parameters, VARIADIC '{5}'::text[]))::bigint = ANY ('{79,80,129,130,113}'::bigint[])))"
Rows Removed by Filter: 28733
Buffers: shared hit=4881
Planning Time: 0.451 ms
Execution Time: 64.973 ms
I see that idx_data_measure is working, and that's all...
Upvotes: 0
Views: 1357
Reputation:
With Postgres 12, I would try something like this:
create index on data (measure_id);
create index on data using gin (parameters);
The operators @>
and @?
can make use of a GIN index, so maybe this gives better performance.
select *
from data
where measure_id = 19
and parameters @? '$."5" ? (@ == 79 || @ == 80 || @ == 129 || @ == 130 || @ == 112)'
and parameters @> '{"1": 12, "2": 59, "6": 249, "7": 248, "8": 412}'
With Postgres 11 you can't use the @?
operator, so maybe splitting the condition up into one that uses the @>
operator (to enable use of the GIN index) and the other using the IN
condition. That will only be efficient if the condition using the @>
operator is highly selective.
select *
from data
where measure_id = 19
and parameters @> '{"1": 12, "2": 59, "6": 249, "7": 248, "8": 412}'
and (parameters ->> '5')::bigint in (79,80,129,130,112);
Note there is no need to use jsonb_extract_path()
. I don't know, but maybe using ->>
is faster.
The plan you posted shows the query that retrieves all rows from the table and also unnests all elements from the JSON. There is no index that will speed that up.
But maybe doing the unnesting only once, makes this faster:
SELECT DISTINCT
d.measure_id,
p.ky AS parameter_id,
p.value AS parameter_value_id
FROM data d
cross join jsonb_each(d.parameters) as p(ky, value)
But again: no index will help with that query, as you are retrieving all rows from the table.
Upvotes: 0