Reputation: 1033
I need to solve the following task: I have a quite large array of IDs in PHP script and I need to select from MySQL DB all rows with IDs NOT IN this array.
There are several similar questions (How to find all records which are NOT in this array? (MySql)) and the most favourite answer is use NOT IN ()
construction with implode(',',$array)
within a brackets.
And this worked... until my array gown up to 2007 IDs and about 20 kB (in my case) I've got a "MySQL server has gone away" error. As I can understand this is because of the lengthy query.
There are also some solutions to this problem like this:
SET GLOBAL max_allowed_packet=1073741824;
(just taken from this question).
Probably I could do it in this way, however now I doubt that NOT IN (implode)
approach is a good one to a big arrays (I expect that in my case array can be up to 8000 IDs and 100 kB).
Is there any better solution for a big arrays?
Thanks!
EDIT 1
As a solution it is recommended to insert all IDs from array to a temporary table and than use JOIN to solve the initial task. This is clear. However I never used temporary tables and therefore I have some additional question (probably worth to be as a separate question but I decided to leave it here):
If I need to do this routine several times during one MySQL session, which approach will be better:
Each time I need to SELECT ID NOT IN PHP array I will create a NEW temporary table (all those tables will be deleted after MySQL connection termination - after my script will be terminated in fact).
I will create a temporary table and delete one after I made needed SELECT
I will TRNCATE a temporary table afterwards.
Which is the better? Or I missed something else?
Upvotes: 2
Views: 433
Reputation: 4042
I've once had to tackle this problem, but with a IN(id)
WHERE Clause with approx 20,000-30,000 identifiers (indexes
).
The way I got around this, with SELECT
query, was that I reduced the number of filtered identifiers and increased the number of times I sent the same query, in order to extract the same data.
You could use array_chunk
for PHP and divide 20,000
by 15
, which would give you 15
separate SQL Calls, filtering records by 1500
identifiers (per call, you can divide more than 15
to reduce the number of identifiers
further). But in your case, if you just divide 2007
idenitifers by 10
it would reduce the number of identifiers you're pushing to the database to 200
per SQL request, there are otherways to optimize this further with temporary tables and so fourth.
By dividing the number of indexes you're trying filter it will speed up each query, to run faster than if you were to send every index to the database in a single dump.
Upvotes: 1
Reputation: 12798
In such cases it is usually better to create a temporary table and perform the query against it instead. It'd be something along the lines of:
CREATE TEMPORARY TABLE t1 (a int);
INSERT INTO t1 VALUES (1),(2),(3);
SELECT * FROM yourtable
LEFT JOIN t1 on (yourtable.id=t1.a)
WHERE t1.a IS NULL;
Of course INSERT
statement should be constructed so that you'd insert all values from your array into the temporary table.
Edit: Inserting all values in a single INSERT
statement would most probably lead into the same problem you already faced. Hence I'd suggest that you use a prepared statement that will be executed to insert the data into temporary table while you iterate through the PHP array.
Upvotes: 3