Lulu
Lulu

Reputation: 470

Slow distinct PostgreSQL query on nested jsonb field won't use index

I'm trying to get distinct values from a nested field on JSONB column, but it takes about 2 minutes on a 400K rows table. The original query used DISTINCT but then I read that GROUP BY works better so tried this too, but no luck - still extremely slow.

Adding an index did not help either:

create index "orders_financial_status_index" on orders ((data ->'data'->> 'financial_status'));

ANALYZE EXPLAIN gave this result:

HashAggregate  (cost=13431.16..13431.22 rows=4 width=32) (actual time=123074.941..123074.943 rows=4 loops=1)
  Group Key: ((data -> 'data'::text) ->> 'financial_status'::text)
  ->  Seq Scan on orders  (cost=0.00..12354.14 rows=430809 width=32) (actual time=2.993..122780.325 rows=434080 loops=1)
Planning time: 0.119 ms
Execution time: 123074.979 ms

It's worth mentioning that there are no null values on this column, and currently there are 4 unique values.

What should I do in order to query the distinct values faster?

Upvotes: 2

Views: 361

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 247820

No index will make this faster, because the query has to scan the whole table.

As you can see, the sequential scan uses almost all the time; the hash aggregate is fast.

Still I would not drop the index, because it allows PostgreSQL to estimate the number of groups accurately and decide on the more efficient hash aggregate rather than sorting the rows. You can try without the index to be sure.

However, two minutes for half a million rows is not very fast. Do you have slow storage? Is the table bloated? If the latter, VACUUM (FULL) should improve things.

You can speed up the query by reducing I/O. Load the table into RAM with pg_prewarm, then processing should be considerably faster.

Upvotes: 1

Related Questions