sur
sur

Reputation: 31

How to create index on json array in postgres?

I have a json field called 'elements' in my table demo which contains an array 'data' containing key value pairs. the 'data' array has the below structure. the data array may have multiple json entries.I am using postgres version 9.5

{
    "data": [{
        "ownr": "1",
        "sigUsr": [2],
        "sigStat": "APPR",
        "modifiedOn": 1494229698039,
        "isDel": "false",
        "parentId": "nil",
        "disName": "exmp.json",
        "uniqueId": "d88cb52",
        "usrType": "owner",
        "usrId": "1",
        "createdOn": 1494229698039,
        "obType": "file"
    }]
}

In my query I have multiple filters based on obj(Eg : obj->>usrId, obj->>sigUsr etc) where obj corresponds to json_array_elements(demo.elements->'data').How do I create btree indices on filters like obj->>userId ,obj->>sigUsr? Please revert.

Regards sur

Upvotes: 2

Views: 2535

Answers (2)

nh2
nh2

Reputation: 25665

While you cannot create indices on JSON arrays, you can apparently convert them to Postgres ARRAYs and create GIN indexes on those.

Inspired by How to index name of an element in xml column in Postgres, here is an example:

https://dbfiddle.uk/mNr6btiM

Extract:

  • Schema and example value:
CREATE TABLE trees (
   id INT PRIMARY KEY,
   body JSONB
);

INSERT INTO trees (id, body)
VALUES(1, '
{
  "id": 1,
  "name": "root",
  "children": [
    {
      "id": 2,
      "name": "Node 2",
      "children": [
        {
          "id": 3,
          "name": "Node 3",
          "children": [
            {
              "id": 4,
              "name": "Node 4",
              "children": [
              ]
            }
          ]
        },
        {
          "id": 5,
          "name": "Node 5",
          "children": [
          ]
        }
      ]
    }
  ]
}
');
  • Helper function so it's easier to keep the index in sync with the expression we want to speed up by the index. The helper function also performs the JSON-array-to-postgres-array conversion using jsonb_array_elements_text():
CREATE OR REPLACE FUNCTION
  tree_names(body jsonb)
RETURNS text[] LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$$
  SELECT ARRAY(SELECT jsonb_array_elements_text(jsonb_path_query_array( body, 'strict $.** ? (@.id >= 3) .name'))) ;
$$ ;

In this example I use the JSONPath query jsonb_path_query_array() to find and transform the elements I'm interested in (here, finding all elements with id >= 3, and returning their names.

  • Index creation:
CREATE INDEX tree_names_gin_index ON trees USING GIN(tree_names(body));
  • A query using an array membership test (@>) that gets sped up by the index:
EXPLAIN ANALYZE
SELECT id FROM trees WHERE tree_names(body) @> ARRAY['Node 3']; -- uses index

  • A query that apparently can NOT use the index (using = ANY(...)):
EXPLAIN ANALYZE
SELECT id FROM trees WHERE 'Node 3' = ANY(tree_names(body)); -- does not use index

I have not tested the performance on this in detail yet, but the EXPLAIN ANALYZE suggests that the added GIN index is used on Postgres 14:

QUERY PLAN
...
 Bitmap Index Scan on tree_names_gin_index

In my example, the Postgres array is an array of strings. I have not checked yet if it is possible or easy to do it with arrays of {non-primitive / compound data types / JSON objects}.

Upvotes: 2

C.C. Hsu
C.C. Hsu

Reputation: 309

First, if your column datatype is JSON rather than JSONB, you can only create index on the whole column demo.elements. Only JSONB columns can have indices on json keys.

You might consider to change your datatype if you specify JSON in this table.

Then, I modify your situation into a test case as follows.

create table demo(
elements jsonb
);
insert into demo values(
'{
    "data": [
        {
            "ownr": "1",
            "sigUsr": [
                2
            ],
            "sigStat": "APPR",
            "modifiedOn": 1494229698039,
            "isDel": "false",
            "parentId": "nil",
            "disName": "exmp.json",
            "uniqueId": "d88cb52",
            "usrType": "owner",
            "usrId": "1",
            "createdOn": 1494229698039,
            "obType": "file"
        }
    ]
}'
);

And query you ask can be achieved in the following ways as I can imagine.

postgres=# -- First possible query
postgres=# select elements->'data'->0->'usrId', elements->'data'->0->'sigUsr' from demo;
 ?column? | ?column? 
----------+----------
 "1"      | [2]
(1 row)

postgres=# -- Second possible query, with jsonb_array_elements()
postgres=# select obj->>'usrId', obj->>'sigUsr' from demo d,  jsonb_array_elements(d.elements->'data') as obj;
 ?column? | ?column? 
----------+----------
 1        | [2]
(1 row)

postgres=# 

I can only create index with the first one, which is a restrictive use case. you need to write specific array entry in your index (in this case is the 0th element).

postgres=# create index ON demo ((elements->'data'->0->'usrId'));
CREATE INDEX
postgres=# 

I can not create index for the second approach since jsonb_array_elements() returns setof jsonb type.

postgres=# create index ON demo ((jsonb_array_elements(elements->'data')->>'usrId'));
ERROR:  set-returning functions are not allowed in index expressions
LINE 1: create index ON demo ((jsonb_array_elements(elements->'data'...
                               ^
postgres=# 

I think you should store sub-json entries in individual rows rather than store them into a json array.

Upvotes: 1

Related Questions