Reputation: 13555
I am currently using php to help me find the valide and non-duplicated entry,
which i need
My approach is first create 5 array 2 for orginal ,1 for no mistake (empty), 1 for valid (empty) , 1 for duplicate (empty)
First using one of orginal array, for each one element : check valid and check duplicate, if invalid , put into invalid array , and check duplicate by using inarray
after all, i get one array of invalid and duplicate , then using the orginal array, check which element is not in that two array. And job done.
My problem is, it seems quite inefficiency, how can i improve it? (Perferable if using some famous algorithm)
Thank you.
// get all duplicate input and store in an array
for ($row = 1; $row <= $highestRow; $row++) {
for ($y = 0; $y < $highestColumn; $y++) {
$val = $sheet->getCellByColumnAndRow($y, $row)->getValue();
//use reg exp to check whether it is valid
if ($y == $mailColumn && !preg_match($pattern,$val))
{$invaild[]=$row;}
//if valid, test whether it is duplicate
elseif ($y == $mailColumn && in_array($val,$email))
{$duplicate[]=$val;
$duplicate[]=$row;}
if ($y == $mailColumn)
{$email[]=$val;
$email=array_unique($email);}
}
}
// unique invalid array since i just need invalid inputs, not the invalid + duplicate input
$invaild=array_unique($invaild);
Upvotes: 0
Views: 173
Reputation: 15616
try this:
<?php
echo "<pre>";
$array1 = array("[email protected]","c","c","[email protected]","test1","","test3","test2","test3");
$array_no_mistake = array_filter($array1,function($subject){if(trim($subject)=="") return true;});
$array_uniq = array_diff(array_unique($array1),$array_no_mistake);
$array_dups = array_diff_assoc(array_diff($array1,$array_no_mistake),$array_uniq);
$array_valid = array_filter($array_uniq,function($subject){
if (preg_match('/\A(?:[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z/i', $subject)) {
return true;
} else {
return false;
}
});
$array_invalid = array_diff_assoc($array_uniq,$array_valid);
print_r($array1);
print_r($array_no_mistake);
print_r($array_uniq);
print_r($array_dups);
print_r($array_valid);
print_r($array_invalid);
?>
Upvotes: 1
Reputation: 1587
1) It would seem you are only interested in the email columns so i think there is no point in iterating over all of the other columns (so the inner loop is basically redundant).
2) You can use associative arrays in order to store emails as indexes and later on efficiently look for duplicates by checking for the existence of the index/email in the array.
Here's an example:
$valid = array();
$invalid = array();
$dups = array();
for ( $row = 0; $row < $highestRow; $row++ )
{
$email = $sheet->getCellByColumnAndRow( $mailColumn, $row )->getValue();
if ( !preg_match( $pattern, $email ) )
{
$invalid[] = $row;
}
else if ( isset( $dups[ $email ] ) )
{
$dups[ $email ][] = $row;
}
else
{
$dups[ $email ] = array();
$valid[] = $row
}
}
At the end of this, $invalid will hold a list of all of the invalid rows, $dups will hold an array of arrays, each indicating the rows in which the current email is the index and its value is an array which lists the rows that share this email. If the array at a certain index is empty, the email is not duplicated. $valid will hold the numbers of the valid rows. Now fancy algorithm, sorry...
Upvotes: 1