Reputation:

PostgreSQL - max number of parameters in "IN" clause?

In Postgres, you can specify an IN clause, like this:

SELECT * FROM user WHERE id IN (1000, 1001, 1002)

Does anyone know what's the maximum number of parameters you can pass into IN?

Upvotes: 225

Views: 207529

Answers (10)

Brad J
Brad J

Reputation: 51

To further expand on this for others, we ran into an issue doing an IN on a field that is designated as a UUID; we were trying to send a little more than 35000 and batching it in groups of 10000 fixed the issue.

Upvotes: 2

OZahed
OZahed

Reputation: 493

I know I'm too late to the question, but it may help others like me whom still searching for this, According to postgres documents there is no specific limit on Number of values, but there is a limit on query parameters size (65,535 Bytes), This query parameter limitation only applys to the query itself, if you are using the query

select * from table1 where id in (select table1_id from table2)

there the 65,535 B Limit won't apply, In such cases, after a number of values inside IN statement, that query optimizer decides based on the column datatype and other factors postgres will ignore the indexes on the id column, so you may want to keep the list in less than a hundred of items if using index on that column

Upvotes: 1

Andrew
Andrew

Reputation: 876

Just tried it. the answer is -> out-of-range integer as a 2-byte value: 32768

Upvotes: 16

blubb
blubb

Reputation: 9910

As someone more experienced with Oracle DB, I was concerned about this limit too. I carried out a performance test for a query with ~10'000 parameters in an IN-list, fetching prime numbers up to 100'000 from a table with the first 100'000 integers by actually listing all the prime numbers as query parameters.

My results indicate that you need not worry about overloading the query plan optimizer or getting plans without index usage, since it will transform the query to use = ANY({...}::integer[]) where it can leverage indices as expected:

-- prepare statement, runs instantaneous:
PREPARE hugeplan (integer, integer, integer, ...) AS
SELECT *
FROM primes
WHERE n IN ($1, $2, $3, ..., $9592);

-- fetch the prime numbers:
EXECUTE hugeplan(2, 3, 5, ..., 99991);

-- EXPLAIN ANALYZE output for the EXECUTE:
"Index Scan using n_idx on primes  (cost=0.42..9750.77 rows=9592 width=5) (actual time=0.024..15.268 rows=9592 loops=1)"
"  Index Cond: (n = ANY ('{2,3,5,7, (...)"
"Execution time: 16.063 ms"

-- setup, should you care:
CREATE TABLE public.primes
(
  n integer NOT NULL,
  prime boolean,
  CONSTRAINT n_idx PRIMARY KEY (n)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE public.primes
  OWNER TO postgres;

INSERT INTO public.primes
SELECT generate_series(1,100000);

However, this (rather old) thread on the pgsql-hackers mailing list indicates that there is still a non-negligible cost in planning such queries, so take my word with a grain of salt.

Upvotes: 28

nimai
nimai

Reputation: 2332

This is not really an answer to the present question, however it might help others too.

At least I can tell there is a technical limit of 32767 values (=Short.MAX_VALUE) passable to the PostgreSQL backend, using Posgresql's JDBC driver 9.1.

This is a test of "delete from x where id in (... 100k values...)" with the postgresql jdbc driver:

Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 100000
    at org.postgresql.core.PGStream.SendInteger2(PGStream.java:201)

Upvotes: 121

Yevhen Surovskyi
Yevhen Surovskyi

Reputation: 961

explain select * from test where id in (values (1), (2));

QUERY PLAN

 Seq Scan on test  (cost=0.00..1.38 rows=2 width=208)
   Filter: (id = ANY ('{1,2}'::bigint[]))

But if try 2nd query:

explain select * from test where id = any (values (1), (2));

QUERY PLAN

Hash Semi Join  (cost=0.05..1.45 rows=2 width=208)
       Hash Cond: (test.id = "*VALUES*".column1)
       ->  Seq Scan on test  (cost=0.00..1.30 rows=30 width=208)
       ->  Hash  (cost=0.03..0.03 rows=2 width=4)
             ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)

We can see that postgres build temp table and join with it

Upvotes: 44

Yevhen Surovskyi
Yevhen Surovskyi

Reputation: 961

If you have query like:

SELECT * FROM user WHERE id IN (1, 2, 3, 4 -- and thousands of another keys)

you may increase performace if rewrite your query like:

SELECT * FROM user WHERE id = ANY(VALUES (1), (2), (3), (4) -- and thousands of another keys)

Upvotes: 0

Prasanth Jayachandran
Prasanth Jayachandran

Reputation: 570

There is no limit to the number of elements that you are passing to IN clause. If there are more elements it will consider it as array and then for each scan in the database it will check if it is contained in the array or not. This approach is not so scalable. Instead of using IN clause try using INNER JOIN with temp table. Refer http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/ for more info. Using INNER JOIN scales well as query optimizer can make use of hash join and other optimization. Whereas with IN clause there is no way for the optimizer to optimize the query. I have noticed speedup of at least 2x with this change.

Upvotes: 23

Jordan S. Jones
Jordan S. Jones

Reputation: 13893

According to the source code located here, starting at line 850, PostgreSQL doesn't explicitly limit the number of arguments.

The following is a code comment from line 870:

/*
 * We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
 * possible if the inputs are all scalars (no RowExprs) and there is a
 * suitable array type available.  If not, we fall back to a boolean
 * condition tree with multiple copies of the lefthand expression.
 * Also, any IN-list items that contain Vars are handled as separate
 * boolean conditions, because that gives the planner more scope for
 * optimization on such clauses.
 *
 * First step: transform all the inputs, and detect whether any are
 * RowExprs or contain Vars.
 */

Upvotes: 110

PatrikAkerstrand
PatrikAkerstrand

Reputation: 45731

You might want to consider refactoring that query instead of adding an arbitrarily long list of ids... You could use a range if the ids indeed follow the pattern in your example:

SELECT * FROM user WHERE id >= minValue AND id <= maxValue;

Another option is to add an inner select:

SELECT * 
FROM user 
WHERE id IN (
    SELECT userId
    FROM ForumThreads ft
    WHERE ft.id = X
);

Upvotes: 0

Related Questions