Reputation: 2147
I have a set of approximately 1.1 million unique IDs and I need to determine which do not have a corresponding record in my application's database. The set of IDs comes from a database as well, but not the same one. I am using PHP and MySQL and have plenty of memory - PHP is running on a server with 15GB RAM and MySQL runs on its own server which has 7.5GB RAM.
Normally I'd simply load all the IDs in one query and then use them with the IN clause of a SELECT query to do the comparison in one shot.
So far my attempts have resulted in scripts that either take an unbearably long time or that spike the CPU to 100%.
What's the best way to load such a large data set and do this comparison?
Upvotes: 0
Views: 1420
Reputation: 212452
Generate a dump of the IDs from the first database into a file, then re-load it into a temporary table on the second database, and do a join between that temporary table and the second database table to identify those ids that don't have a matching record. Once you've generated that list, you can drop the temporary table.
That way, you're not trying to work with large volumes of data in PHP itself, so you shouldn't have any memory issues.
Upvotes: 3
Reputation: 6832
Assuming you can't join the tables since they are not on the same DB server, and that your server can handle this, I would populate an array with all the IDs from one DB, then loop over the IDs from the other and use in_array to see if each one exists in the array.
BTW - according to this, you can make the in_array more efficient.
Upvotes: 1