Postgresql: query on jsonb column - index doesn't make it quicker

Question

There is a table in Postgresql 9.6, query on jsonb column is slow compared to a relational table, and adding a GIN index on it doesn't make it quicker.

Table:

-- create table
create table dummy_jsonb (
    id serial8,
    data jsonb,
    primary key (id)
);

-- create index
CREATE INDEX dummy_jsonb_data_index ON dummy_jsonb USING gin (data);
-- CREATE INDEX dummy_jsonb_data_index ON dummy_jsonb USING gin (data jsonb_path_ops);

Generate data:

-- generate data,
CREATE OR REPLACE FUNCTION dummy_jsonb_gen_data(n integer) RETURNS integer AS $$
DECLARE
    i integer:=1;
    name varchar;
    create_at varchar;
    json_str varchar;
BEGIN
    WHILE i<=n LOOP
        name:='dummy_' || i::text;
        create_at:=EXTRACT(EPOCH FROM date_trunc('milliseconds', now())) * 1000;
        json_str:='{
                 "name": "' || name || '",
                 "size": ' || i || ',
                 "create_at": ' || create_at || '
               }';

        insert into dummy_jsonb(data) values
        (json_str::jsonb
        );
        i:= i + 1;
    END LOOP;

    return n;
END;
$$ LANGUAGE plpgsql;

-- call function,
select dummy_jsonb_gen_data(1000000);

-- drop function,
DROP FUNCTION IF EXISTS dummy_jsonb_gen_data(integer);

Query:

select * from dummy_jsonb
where data->>'name' like 'dummy_%' and data->>'size' >= '500000'
order by data->>'size' desc
offset 50000 limit 10;

Test result:

The query takes 1.8 seconds on a slow vm.
Adding or removing the index, don't make a difference.
Changing to index gin with jsonb_path_ops, also don't make a difference.

Questions:

Is it possible to make the query quicker, either improve index or sql?
If not, the does it means, within pg a relational table is more proper in this case?
And, in my test, mongodb performs better, does that means mongodb is more proper for such storage & query?

user330315 · Accepted Answer

Quote from the manual

The default GIN operator class for jsonb supports queries with top-level key-exists operators ?, ?& and ?| operators and path/value-exists operator @> [...] The non-default GIN operator class jsonb_path_ops supports indexing the @> operator only.

Your query uses LIKE and string comparison with > (which is probably not correct to begin with), neither of those are supported by a GIN index.

But even an index on (data ->> 'name') wouldn't be used for the condition data->>'name' like 'dummy_%' as that is true for all rows because every name starts with dummy.

You can create a regular btree index on the name:

CREATE INDEX ON dummy_jsonb ( (data ->> 'name') varchar_pattern_ops);

Which will be used if the condition is restrictive enough, e.g.:

where data->>'name' like 'dummy_9549%'

If you need to query for the size, you can create an index on ((data ->> 'size')::int) and then use something like this:

where (data->>'size')::int >= 500000

However your use of limit and offset will always force the database to read all rows, sort them and the limit the result. This is never going to be very fast. You might want to read this article for more information why limit/offset is not very efficient.

JSON is a nice addition to the relational world, but only if you use it appropriately. If you don't need dynamic attributes for a row, then use standard columns and data types. Even though JSON support is Postgres is extremely good, this doesn't mean one should use it for everything, just because it's the current hype. Postgres is still a relational database and should be used as such.

Unrelated, but: your function to generate the test data can be simplified to a single SQL statement. You might not have been aware of the generate_series() function for things like that:

insert into dummy_jsonb(data)
select jsonb_build_object('name', 'dummy_'||i, 
                          'size', i::text, 
                          'created_at', (EXTRACT(EPOCH FROM date_trunc('milliseconds', clock_timestamp())) * 1000)::text)
from generate_series(1,1000000) as t(i);

Postgresql: query on jsonb column - index doesn't make it quicker

Answers (2)

Related Questions

Postgresql: query on jsonb column - index doesn&#39;t make it quicker

Answers (2)

Related Questions

Postgresql: query on jsonb column - index doesn't make it quicker