Vlada Katlinskaya
Vlada Katlinskaya

Reputation: 1033

How do I select rows which IDs not in PHPs LARGE array?

I need to solve the following task: I have a quite large array of IDs in PHP script and I need to select from MySQL DB all rows with IDs NOT IN this array.

There are several similar questions (How to find all records which are NOT in this array? (MySql)) and the most favourite answer is use NOT IN () construction with implode(',',$array) within a brackets.

And this worked... until my array gown up to 2007 IDs and about 20 kB (in my case) I've got a "MySQL server has gone away" error. As I can understand this is because of the lengthy query.

There are also some solutions to this problem like this:

SET GLOBAL max_allowed_packet=1073741824;

(just taken from this question).

Probably I could do it in this way, however now I doubt that NOT IN (implode) approach is a good one to a big arrays (I expect that in my case array can be up to 8000 IDs and 100 kB).

Is there any better solution for a big arrays?

Thanks!

EDIT 1

As a solution it is recommended to insert all IDs from array to a temporary table and than use JOIN to solve the initial task. This is clear. However I never used temporary tables and therefore I have some additional question (probably worth to be as a separate question but I decided to leave it here):

If I need to do this routine several times during one MySQL session, which approach will be better:

  1. Each time I need to SELECT ID NOT IN PHP array I will create a NEW temporary table (all those tables will be deleted after MySQL connection termination - after my script will be terminated in fact).

  2. I will create a temporary table and delete one after I made needed SELECT

  3. I will TRNCATE a temporary table afterwards.

Which is the better? Or I missed something else?

Upvotes: 2

Views: 433

Answers (2)

classicjonesynz
classicjonesynz

Reputation: 4042

I've once had to tackle this problem, but with a IN(id) WHERE Clause with approx 20,000-30,000 identifiers (indexes).

The way I got around this, with SELECT query, was that I reduced the number of filtered identifiers and increased the number of times I sent the same query, in order to extract the same data.

You could use array_chunk for PHP and divide 20,000 by 15, which would give you 15 separate SQL Calls, filtering records by 1500 identifiers (per call, you can divide more than 15 to reduce the number of identifiers further). But in your case, if you just divide 2007 idenitifers by 10 it would reduce the number of identifiers you're pushing to the database to 200 per SQL request, there are otherways to optimize this further with temporary tables and so fourth.

By dividing the number of indexes you're trying filter it will speed up each query, to run faster than if you were to send every index to the database in a single dump.

Upvotes: 1

vhu
vhu

Reputation: 12798

In such cases it is usually better to create a temporary table and perform the query against it instead. It'd be something along the lines of:

CREATE TEMPORARY TABLE t1 (a int);

INSERT INTO t1 VALUES (1),(2),(3);

SELECT * FROM yourtable 
 LEFT JOIN t1 on (yourtable.id=t1.a)
 WHERE t1.a IS NULL;

Of course INSERT statement should be constructed so that you'd insert all values from your array into the temporary table.

Edit: Inserting all values in a single INSERT statement would most probably lead into the same problem you already faced. Hence I'd suggest that you use a prepared statement that will be executed to insert the data into temporary table while you iterate through the PHP array.

Upvotes: 3

Related Questions