Reputation: 2034
I've got an array like this:
$a = array(
array(2 => 1, 4 => 2, 9 => 3),
array(3 => 7, 4 => 5, 7 => 3),
array(1 => 6, 4 => 5),
...
);
So the array contains a huge amount of sub arrays with integer key => integer value. Now I want to find subarrays which share no keys or if they share a key the value of this key must be the same.
Example: $a[1] and $a[2] would match because $a[1][4] == $a[2][4] and no other keys match. But $a[0] and $a[1] would not match because $a[0][4] != $a[1][4].
The number of elements in the subarrays may vary.
Is there an efficient way to do this ? The only way I can think of is check each possible pair in a nested loop resulting in O(n^2).
If someone has an idea for a more meaningful title feel free to edit it.
Maybe code makes it more clear: (naive implementation)
$pairs = array();
for($i = 0; $i < count($a); $i++)
for($j = $i+1; $j < count($a); $j++)
if(array_intersect_key($a[$i], $a[$j]) == array_intersect_assoc($a[$i], $a[$j]))
$pairs[] = array($i, $j);
Alternative:
$matching = array();
for($i = 0; $i < count($a); $i++)
for($j = $i+1; $j < count($a); $j++)
if(array_intersect_key($a[$i], $a[$j]) == array_intersect_assoc($a[$i], $a[$j]))
list($matching[$i][], $matching[$j][]) = array($j, $i);
Upvotes: 1
Views: 763
Reputation: 3269
There might be ways to do it, but it somewhat depends on if you know how many matches are likely (or the general 'matchyness' of your data). If there's more matches than not it might be better to start with assuming everything matches and eliminating.
In any case, I think you can pre-process the data. I'm not sure if this is faster -- it really depends on the distribution of your data, but I'd start by trying something like this and work from there:
$a = array(
array(2 => 1, 4 => 2, 9 => 3),
array(3 => 7, 4 => 5, 7 => 3),
array(1 => 6, 4 => 5),
array(1 => 6, 4 => 5, 7 => 5),
array(2 => 1, 4 => 2, 9 => 3)
);
// 1 and 2 match, 2 and 3 match, 0 and 4 match
$keyData = array();
for ($i = 0; $i < count($a); $i++) {
foreach($a[$i] as $k => $v) {
if (!isset($keyData[$k])) {
$keyData[$k] = array();
}
if (!isset($keyData[$k][$v])) {
$keyData[$k][$v] = array();
}
$keyData[$k][$v][] = $i;
}
}
$potentialMatches = array();
foreach ($keyData as $key => $values) {
// Ignore single key/value pairs
if (count($values) > 1) {
foreach ($values as $value => $arrayIndices) {
for ($i = 0; $i < count($arrayIndices); $i ++) {
for ($j = $i + 1; $j < count($arrayIndices); $j ++) {
$potentialMatches[] = array($arrayIndices[$i], $arrayIndices[$j]);
}
}
}
}
}
// You might need to do this ...
/*
foreach ($potentialMatches as &$m) {
array_unique($m);
}
*/
$pairs = array();
foreach ($potentialMatches as $m) {
if(array_intersect_key($a[$m[0]], $a[$m[1]])
== array_intersect_assoc($a[$m[0]], $a[$m[1]])) {
$pairs[] = $m;
}
}
print_r($pairs);
Output:
Array
(
[0] => Array
(
[0] => 0
[1] => 4
)
[1] => Array
(
[0] => 1
[1] => 2
)
[2] => Array
(
[0] => 2
[1] => 3
)
)
EDIT
As I said in my comment, that doesn't catch arrays that don't share any keys -- which you consider a match. The code below does this, although I'm not sure if it's faster than the nested solution (and it's going to use a ton of memory)
// New test data to cover the case I missed
$a = array(
array(2 => 1, 4 => 2, 9 => 3),
array(3 => 7, 4 => 5, 7 => 3),
array(1 => 6, 4 => 5),
array(1 => 6, 4 => 5, 7 => 5),
array(2 => 1, 4 => 2, 9 => 3),
array(8 => 3)
);
// 1 and 2 match, 2 and 3 match, 0 and 4 match, 5 matches all
// First assume everything is a match, build an array of:
// indicies => array of potential matches
$potentialMatches = array_fill(0, count($a), array_keys($a));
// Build data about each key, the indicies that contain that key
// and the indicies for each value of that key
$keyData = array();
for ($i = 0; $i < count($a); $i++) {
foreach($a[$i] as $k => $v) {
if (!isset($keyData[$k])) {
$keyData[$k] = array();
}
if (!isset($keyData[$k][$v])) {
$keyData[$k][$v] = array();
}
$keyData[$k]['all'][] = $i;
$keyData[$k][$v][] = $i;
}
}
// print_r($keyData);
// Now go through the key data and eliminate indicies that
// can't match
foreach ($keyData as $key => $values) {
if (count($values) > 2) { // Ignore single key/value pairs
// Two indecies do not match if they appear in seperate value lists
// First get the list of all indicies that have this key
$all = array_unique($values['all']);
unset($values['all']);
// Now go through the value lists
foreach ($values as $value => $arrayIndices) {
// The indicies for this value cannot match the other
// indices in the system, i.e. this list
$cantMatch = array_diff($all, $arrayIndices);
// So remove the indicies that can't match from the potentials list
foreach ($arrayIndices as $index) {
$potentialMatches[$index] = array_diff($potentialMatches[$index], $cantMatch);
}
}
}
}
//print_r($potentialMatches);
// You said you didn't mind the output format, so that's probably enough
// but that array contains (x,x) which is pointless and both (x,y) and (y,x)
// so we can do one final bit of processing to print it out in a nicer way
$pairs = array();
foreach ($potentialMatches as $x => $matches) {
foreach ($matches as $y) {
if ( ($x < $y) ) {
$pairs[] = array($x, $y);
}
}
}
print_r($pairs);
Output
Array
(
[0] => Array
(
[0] => 0
[1] => 4
)
[1] => Array
(
[0] => 0
[1] => 5
)
[2] => Array
(
[0] => 1
[1] => 2
)
[3] => Array
(
[0] => 1
[1] => 5
)
[4] => Array
(
[0] => 2
[1] => 3
)
[5] => Array
(
[0] => 2
[1] => 5
)
[6] => Array
(
[0] => 3
[1] => 5
)
[7] => Array
(
[0] => 4
[1] => 5
)
)
Upvotes: 1
Reputation: 3844
if you are looking for adjacent matching,
$temp = null;
$last_result = array();
foreach($a as $key => $value){
if(is_null($temp)){
$temp = $value;
} else{
$result = array_intersect_assoc($temp, $value);
if(!empty($result))
array_push($last_result, $result);
$temp = $value;
}
}
print_r($last_result);
otherwise just use array_intersect_assoc
for a example you can do like this
$res = array_intersect_assoc($a[0],$a[1],$a[2]);
print_r($res);
Upvotes: 0