Comparing large MySQL data sets with PHP

Question

I have a set of approximately 1.1 million unique IDs and I need to determine which do not have a corresponding record in my application's database. The set of IDs comes from a database as well, but not the same one. I am using PHP and MySQL and have plenty of memory - PHP is running on a server with 15GB RAM and MySQL runs on its own server which has 7.5GB RAM.

Normally I'd simply load all the IDs in one query and then use them with the IN clause of a SELECT query to do the comparison in one shot.

So far my attempts have resulted in scripts that either take an unbearably long time or that spike the CPU to 100%.

What's the best way to load such a large data set and do this comparison?

Mark Baker · Accepted Answer

Generate a dump of the IDs from the first database into a file, then re-load it into a temporary table on the second database, and do a join between that temporary table and the second database table to identify those ids that don't have a matching record. Once you've generated that list, you can drop the temporary table.

That way, you're not trying to work with large volumes of data in PHP itself, so you shouldn't have any memory issues.

Comparing large MySQL data sets with PHP

Answers (2)

Related Questions