Reputation: 351
The problem I found that I cannot make Postgres use GIN index when I use jsonb_* functions in my queries. The problem exists for both jsonb_ops and jsonb_path_ops operator classes.
Let's first make preparations. Create the table
CREATE TABLE applications
(
id TEXT PRIMARY KEY,
application JSONB
);
CREATE INDEX ON applications USING gin (application jsonb_path_ops);
Let's fill the table with some bulk of data (to make Postgres use GIN index)
INSERT INTO applications(id, application)
VALUES ('1', '{
"type_code": 1,
"persons": [
{
"type_code": 4,
"firstname": "John",
"lastname": "Doe"
}
]
}');
INSERT INTO applications (SELECT i, a.application FROM applications a, generate_series(2, 100000) i);
Then I try to select the data like the following:
EXPLAIN ANALYZE
SELECT * FROM applications
WHERE applications.application @? '$.persons[*] ? (@.type_code == 3)';
Note that GIN index was used
-- Bitmap Heap Scan on applications (cost=64.00..68.01 rows=1 width=130) (actual time=0.410..0.419 rows=0 loops=1)
-- " Recheck Cond: (application @? '$.""persons""[*]?(@.""type_code"" == 3)'::jsonpath)"
-- -> Bitmap Index Scan on applications_application_idx (cost=0.00..64.00 rows=1 width=0) (actual time=0.095..0.096 rows=0 loops=1)
-- " Index Cond: (application @? '$.""persons""[*]?(@.""type_code"" == 3)'::jsonpath)"
-- Planning Time: 1.493 ms
-- Execution Time: 0.861 ms
Now I try to select the data like this:
EXPLAIN ANALYZE
SELECT * FROM applications
WHERE jsonb_path_exists(
applications.application,
'$.persons[*] ? (@.type_code == 3)'
);
You can see that GIN index was not used in this case:
-- Aggregate (cost=3374.33..3374.34 rows=1 width=8) (actual time=114.048..114.055 rows=1 loops=1)
-- -> Seq Scan on applications (cost=0.00..3291.00 rows=33333 width=0) (actual time=0.388..109.580 rows=100000 loops=1)
-- " Filter: jsonb_path_exists(application, '$.""persons""[*]?(@.""type_code"" == 3)'::jsonpath, '{}'::jsonb, false)"
-- Planning Time: 1.514 ms
-- Execution Time: 114.674 ms
Is it possible to make Postgres use GIN index in the second query?
Using jsonb_* functions is preferred for me because I can use positional parameters to build query:
SELECT * FROM applications
WHERE jsonb_path_exists(
applications.application,
'$.persons[*] ? (@.type_code == $person_type_code)',
jsonb_build_object('person_type_code', $1)
);
Upvotes: 2
Views: 272
Reputation: 26347
How to make Postgres GIN index work with jsonb_* functions?
You can't*. PostgreSQL indexes are tied to operators in specific operator classes:
In general, PostgreSQL indexes can be used to optimize queries that contain one or more WHERE or JOIN clauses of the form
indexed-column
indexable-operator
comparison-value
Here, the
indexed-column
is whatever column or expression the index has been defined on. Theindexable-operator
is an operator that is a member of the index's operator class for the indexed column. And thecomparison-value
can be any expression that is not volatile and does not reference the index's table.
GIN will help you only if you use the operators in the opclass you used when you defined the index (jsonb_ops
by default):
The default GIN operator class for
jsonb
supports queries with the key-exists operators?
,?|
and?&
, the containment operator@>
, and thejsonpath
match operators@?
and@@
.
Even though there are equivalent jsonb_path_X()
functions that do the exact same thing those operators do, the index will only kick in if you use the operator and not the function.
*Except you kind of can
There are cases like PostGIS where functions do in fact use the index but that's because they wrap an operator or add an operator-based condition that's using the index, then use the actual function to just re-check pre-filtered rows. You can mimmick that if you want: demo
CREATE OR REPLACE FUNCTION my_jsonb_path_exists(arg1 jsonb,arg2 jsonpath)
RETURNS boolean AS 'SELECT $1 @? $2' LANGUAGE 'sql' IMMUTABLE;
EXPLAIN ANALYZE
SELECT * FROM applications
WHERE my_jsonb_path_exists(
applications.application,
'$.persons[*] ? (@.type_code == 3)'
);
QUERY PLAN |
---|
Bitmap Heap Scan on applications (cost=165.51..5277.31 rows=21984 width=163) (actual time=15.650..83.960 rows=22219 loops=1) |
Recheck Cond: (application @? '$."persons"[*]?(@."type_code" == 3)'::jsonpath) |
Heap Blocks: exact=4798 |
-> Bitmap Index Scan on gin_idx (cost=0.00..160.01 rows=21984 width=0) (actual time=14.891..14.892 rows=22219 loops=1) |
Index Cond: (application @? '$."persons"[*]?(@."type_code" == 3)'::jsonpath) |
Planning Time: 0.231 ms |
Execution Time: 85.092 ms |
You can see now it uses the index because the condition got rewritten as the operator it was wrapping. It finds 22219 matches because I increased the sample set to 200k and randomised the rows.
Upvotes: 3