Reputation: 2827

SQL query using IN with a list very slow

Is there a way to optimize a query done like this:

SELECT count(*) FROM mytable 
WHERE 
indexed_field IN ('v1', 'v2', ..., 'v2000')
AND NOT regular_field='something'
AND other_regular_field='something_else';

This query "works". The problem is that it is extremely slow (1min+). I thought that using the IN expression would be fine using a defined list, but it is turning out to not be that fine.

I'm using Postgresql 9.6.

My table has 310K.

The query expl:

QUERY PLAN
Aggregate  (cost=396158.70..396158.71 rows=1 width=8) (actual time=8630.558..8630.559 rows=1 loops=1)
  ->  Seq Scan on mytable  (cost=0.00..396156.77 rows=772 width=0) (actual time=7853.840..8630.478 rows=916 loops=1)
        Filter: (((non_indexed_field)::text <> 'value1'::text) AND ((non_indexed_field)::text = 'value2'::text) AND ((indexed_field)::text = ANY ('{lots and lots of values....}'::text[])))
        Rows Removed by Filter: 306768
Planning time: 2.543 ms

Execution time: 8630.770 ms

With analyze, buffers:

Aggregate  (cost=396158.70..396158.71 rows=1 width=8) (actual time=9278.560..9278.560 rows=1 loops=1)
   Buffers: shared hit=14244
   ->  Seq Scan on mytable  (cost=0.00..396156.77 rows=772 width=0) (actual time=8584.520..9278.431 rows=916 loops=1)
         Filter: (((non_indexed_field)::text <> 'value1'::text) AND ((non_indexed_field)::text = 'value2'::text) AND ((indexed_field)::text = ANY ('{lots and lots of values}'::text[])))
         Rows Removed by Filter: 306768
         Buffers: shared hit=14244
 Planning time: 1.293 ms
 Execution time: 9278.646 ms
(8 rows)

Upvotes: 1

Answers (3)

user330315

Reputation:

Very often joining to a a VALUES clause is more efficient:

SELECT count(*) 
FROM mytable 
  JOIN (
     values ('v1'), ('v2'), (...), ('v2000')
  ) as lookup(value) on lookup.value = mytable.some_column
WHERE NOT other_column = 'something'
AND another_column = 'something_else';

Pay attention to the format of the "list". In the values clause, each value needs to be enclosed in parentheses. ('v1'), ('v2'), ... instead of ('v1', 'v2', ...)

Online example: http://rextester.com/UUWVG71217

Note that if the column you compare the values to really is a number (e.g. integer), you should not specify the values using single quotes, e.g. values (1),(2),(3),...(2000)

Upvotes: 8

Builder

Reputation: 1056

My simple suggestion you have to create indexes on these three fields you are using here

indexed_field regular_field other_regular_field

as db search engine looking into table for all these three fields and if there is no index it goes to full table scan. Now you have vert small amount of data so it comes back in just 1+ minute, think if it is 4-5 million rows, and it would take hour. Index 3 fields and see the difference. Dont worry index will not hurt in any way.

Upvotes: 0

swissben

Reputation: 1077

I'd rather recommend to use sub-selects (sub-queries) to pre-select the smaller set of data, e.g.:

SELECT count(*) FROM 
(SELECT * FROM mytable 
WHERE indexed_field IN ('v1', 'v2', ..., 'v2000')) t
WHERE NOT t.regular_field='something'
AND t.other_regular_field='something_else';

Upvotes: -1

SQL query using IN with a list very slow

Answers (3)

Related Questions