Reputation: 4163
I need to compare two arrays like this
// currently in our database
$firstArr = array(
["id"=>1, "another_id"=>1],
["id"=>2, "another_id"=>4],
["id"=>5, "another_id"=>9]
);
// currently fetched from csv-files
$secondArr = array(
["id"=>6, "another_id"=>3],
["id"=>2, "another_id"=>7],
["id"=>1, "another_id"=>1]
);
The first array represents the data that is currently in our database, meanwhile the second one represents the data that is delivered from an csv
-file.
In order to not delete the whole data from the database, I need to compare the two array's. If the csv
-file offers data that is not in the database, I want to input these data sets. If the database includes data that is not in the csv
-file, I want to remove them from the database.
I come up with a solution that works for a small amount of data in the arrays:
$new_to_database = array();
foreach($secondArr AS $arr){
$in_database = array_filter(array_map(function($el) use ($arr){
if($el['id'] == $arr['id'] && $el['another_id'] == $arr['another_id']){
return $el;
}
}, $firstArr));
if(count($in_database) === 0){
$new_to_database[] = $arr;
}
}
var_dump($new_to_database); // input later on
// array(2) { [0]=> array(2) { ["id"]=> int(6) ["another_id"]=> int(3) } [1]=> array(2) { ["id"]=> int(2) ["another_id"]=> int(7) } }
The problem is that each array includes round about 5000 data sets. As a result the script takes far to long and I get the error that the execution time of 360 seconds is exceeded.
How can I solve this? I guess the algorithm should be way more efficient.
Upvotes: 2
Views: 111
Reputation: 78994
You can index each array by the id
and then compute the difference in the keys:
// Get items that are in CSV but not in database
$new = array_diff_key(array_column($csv, null, 'id'), array_column($db, null, 'id'));
// Get items that are in database but not in CSV
$del = array_diff_key(array_column($db, null, 'id'), array_column($csv, null, 'id'));
Here's a DEMO using two simple arrays of 50,000 items each.
That's short and sweet but you may not want to run the array_column
multiple times:
$csv = array_column($csv, null, 'id');
$db = array_column($db, null, 'id');
// Get items that are in CSV but not in database
$new = array_diff_key($csv, $db);
// Get items that are in database but not in CSV
$del = array_diff_key($db, $csv);
To compare the entire array contents (this will work if you add more than just id
and another_id
), you can map each array and serialize into a string representation and compare. Note that the order of the elements matters as well:
// Get items that are in CSV but not in database
$new = array_map('unserialize',
array_diff(array_map('serialize', $csv), array_map('serialize', $db)));
// Get items that are in database but not in CSV
$del = array_map('unserialize',
array_diff(array_map('serialize', $db), array_map('serialize', $csv)));
On the DEMO site this one worked for 40,000 array items but not 50,000. This will depend on your resources and php.ini
settings.
There's also an array_intersect_key and array_intersect if you want to see what is the same in each array.
Upvotes: 2
Reputation: 377
You are doing a lot of loops. foreach
, array_filter
and array_map
use a lot of time.
$new_to_database = array_udiff($secondArr, $firstArr, function ($s, $f) {
if ($s['id'] == $f['id'] && $s['another_id'] == $f['another_id']) {
return 0;
} else {
return -1;
}
});
This gives the results you need with less iterations.
Upvotes: 1