Reputation: 2827
Is there a way to optimize a query done like this:
SELECT count(*) FROM mytable
WHERE
indexed_field IN ('v1', 'v2', ..., 'v2000')
AND NOT regular_field='something'
AND other_regular_field='something_else';
This query "works". The problem is that it is extremely slow (1min+). I thought that using the IN
expression would be fine using a defined list, but it is turning out to not be that fine.
I'm using Postgresql 9.6.
My table has 310K.
The query expl:
QUERY PLAN
Aggregate (cost=396158.70..396158.71 rows=1 width=8) (actual time=8630.558..8630.559 rows=1 loops=1)
-> Seq Scan on mytable (cost=0.00..396156.77 rows=772 width=0) (actual time=7853.840..8630.478 rows=916 loops=1)
Filter: (((non_indexed_field)::text <> 'value1'::text) AND ((non_indexed_field)::text = 'value2'::text) AND ((indexed_field)::text = ANY ('{lots and lots of values....}'::text[])))
Rows Removed by Filter: 306768
Planning time: 2.543 ms
Execution time: 8630.770 ms
With analyze, buffers:
Aggregate (cost=396158.70..396158.71 rows=1 width=8) (actual time=9278.560..9278.560 rows=1 loops=1)
Buffers: shared hit=14244
-> Seq Scan on mytable (cost=0.00..396156.77 rows=772 width=0) (actual time=8584.520..9278.431 rows=916 loops=1)
Filter: (((non_indexed_field)::text <> 'value1'::text) AND ((non_indexed_field)::text = 'value2'::text) AND ((indexed_field)::text = ANY ('{lots and lots of values}'::text[])))
Rows Removed by Filter: 306768
Buffers: shared hit=14244
Planning time: 1.293 ms
Execution time: 9278.646 ms
(8 rows)
Upvotes: 1
Views: 3497
Reputation:
Very often joining to a a VALUES
clause is more efficient:
SELECT count(*)
FROM mytable
JOIN (
values ('v1'), ('v2'), (...), ('v2000')
) as lookup(value) on lookup.value = mytable.some_column
WHERE NOT other_column = 'something'
AND another_column = 'something_else';
Pay attention to the format of the "list". In the values
clause, each value needs to be enclosed in parentheses. ('v1'), ('v2'), ...
instead of ('v1', 'v2', ...)
Online example: http://rextester.com/UUWVG71217
Note that if the column you compare the values to really is a number (e.g. integer), you should not specify the values using single quotes, e.g. values (1),(2),(3),...(2000)
Upvotes: 8
Reputation: 1056
My simple suggestion you have to create indexes on these three fields you are using here
indexed_field regular_field other_regular_field
as db search engine looking into table for all these three fields and if there is no index it goes to full table scan. Now you have vert small amount of data so it comes back in just 1+ minute, think if it is 4-5 million rows, and it would take hour. Index 3 fields and see the difference. Dont worry index will not hurt in any way.
Upvotes: 0
Reputation: 1077
I'd rather recommend to use sub-selects (sub-queries) to pre-select the smaller set of data, e.g.:
SELECT count(*) FROM
(SELECT * FROM mytable
WHERE indexed_field IN ('v1', 'v2', ..., 'v2000')) t
WHERE NOT t.regular_field='something'
AND t.other_regular_field='something_else';
Upvotes: -1