donmaro
donmaro

Reputation: 351

How to make Postgres GIN index work with jsonb_* functions?

The problem I found that I cannot make Postgres use GIN index when I use jsonb_* functions in my queries. The problem exists for both jsonb_ops and jsonb_path_ops operator classes.

Let's first make preparations. Create the table

CREATE TABLE applications
(
    id          TEXT PRIMARY KEY,
    application JSONB
);
CREATE INDEX ON applications USING gin (application jsonb_path_ops);

Let's fill the table with some bulk of data (to make Postgres use GIN index)

INSERT INTO applications(id, application)
VALUES ('1', '{
  "type_code": 1,
  "persons": [
    {
      "type_code": 4,
      "firstname": "John",
      "lastname": "Doe"
    }
  ]
}');
INSERT INTO applications (SELECT i, a.application FROM applications a, generate_series(2, 100000) i);

Then I try to select the data like the following:

EXPLAIN ANALYZE
SELECT * FROM applications
WHERE applications.application @? '$.persons[*] ? (@.type_code == 3)';

Note that GIN index was used

-- Bitmap Heap Scan on applications  (cost=64.00..68.01 rows=1 width=130) (actual time=0.410..0.419 rows=0 loops=1)
-- "  Recheck Cond: (application @? '$.""persons""[*]?(@.""type_code"" == 3)'::jsonpath)"
--   ->  Bitmap Index Scan on applications_application_idx  (cost=0.00..64.00 rows=1 width=0) (actual time=0.095..0.096 rows=0 loops=1)
-- "        Index Cond: (application @? '$.""persons""[*]?(@.""type_code"" == 3)'::jsonpath)"
-- Planning Time: 1.493 ms
-- Execution Time: 0.861 ms

Now I try to select the data like this:

EXPLAIN ANALYZE
SELECT * FROM applications
WHERE jsonb_path_exists(
      applications.application,
      '$.persons[*] ? (@.type_code == 3)'
);

You can see that GIN index was not used in this case:

-- Aggregate  (cost=3374.33..3374.34 rows=1 width=8) (actual time=114.048..114.055 rows=1 loops=1)
--   ->  Seq Scan on applications  (cost=0.00..3291.00 rows=33333 width=0) (actual time=0.388..109.580 rows=100000 loops=1)
-- "        Filter: jsonb_path_exists(application, '$.""persons""[*]?(@.""type_code"" == 3)'::jsonpath, '{}'::jsonb, false)"
-- Planning Time: 1.514 ms
-- Execution Time: 114.674 ms

Is it possible to make Postgres use GIN index in the second query?

Using jsonb_* functions is preferred for me because I can use positional parameters to build query:

SELECT * FROM applications
WHERE jsonb_path_exists(
      applications.application,
      '$.persons[*] ? (@.type_code == $person_type_code)',
      jsonb_build_object('person_type_code', $1)
);

Upvotes: 2

Views: 272

Answers (1)

Zegarek
Zegarek

Reputation: 26347

How to make Postgres GIN index work with jsonb_* functions?

You can't*. PostgreSQL indexes are tied to operators in specific operator classes:

In general, PostgreSQL indexes can be used to optimize queries that contain one or more WHERE or JOIN clauses of the form

indexed-column indexable-operator comparison-value

Here, the indexed-column is whatever column or expression the index has been defined on. The indexable-operator is an operator that is a member of the index's operator class for the indexed column. And the comparison-value can be any expression that is not volatile and does not reference the index's table.

GIN will help you only if you use the operators in the opclass you used when you defined the index (jsonb_ops by default):

The default GIN operator class for jsonb supports queries with the key-exists operators ?, ?| and ?&, the containment operator @>, and the jsonpath match operators @? and @@.

Even though there are equivalent jsonb_path_X() functions that do the exact same thing those operators do, the index will only kick in if you use the operator and not the function.


*Except you kind of can

There are cases like PostGIS where functions do in fact use the index but that's because they wrap an operator or add an operator-based condition that's using the index, then use the actual function to just re-check pre-filtered rows. You can mimmick that if you want: demo

CREATE OR REPLACE FUNCTION my_jsonb_path_exists(arg1 jsonb,arg2 jsonpath)
RETURNS boolean AS 'SELECT $1 @? $2' LANGUAGE 'sql' IMMUTABLE;

EXPLAIN ANALYZE
SELECT * FROM applications
WHERE my_jsonb_path_exists(
      applications.application,
      '$.persons[*] ? (@.type_code == 3)'
);
QUERY PLAN
Bitmap Heap Scan on applications (cost=165.51..5277.31 rows=21984 width=163) (actual time=15.650..83.960 rows=22219 loops=1)
  Recheck Cond: (application @? '$."persons"[*]?(@."type_code" == 3)'::jsonpath)
  Heap Blocks: exact=4798
  -> Bitmap Index Scan on gin_idx (cost=0.00..160.01 rows=21984 width=0) (actual time=14.891..14.892 rows=22219 loops=1)
        Index Cond: (application @? '$."persons"[*]?(@."type_code" == 3)'::jsonpath)
Planning Time: 0.231 ms
Execution Time: 85.092 ms

You can see now it uses the index because the condition got rewritten as the operator it was wrapping. It finds 22219 matches because I increased the sample set to 200k and randomised the rows.

Upvotes: 3

Related Questions