user782104
user782104

Reputation: 13555

Algorithm in finding valid and non-duplicate entries in php

I am currently using php to help me find the valide and non-duplicated entry,

which i need

  1. a list of valid and non-duplicated
  2. entry a list of invalid input (unique)
  3. a list of duplicate input

My approach is first create 5 array 2 for orginal ,1 for no mistake (empty), 1 for valid (empty) , 1 for duplicate (empty)

First using one of orginal array, for each one element : check valid and check duplicate, if invalid , put into invalid array , and check duplicate by using inarray

after all, i get one array of invalid and duplicate , then using the orginal array, check which element is not in that two array. And job done.

My problem is, it seems quite inefficiency, how can i improve it? (Perferable if using some famous algorithm)

Thank you.

   // get all duplicate input and store in an array
    for ($row = 1; $row <= $highestRow; $row++) {
    for ($y = 0; $y < $highestColumn; $y++) {
        $val = $sheet->getCellByColumnAndRow($y, $row)->getValue();

//use reg exp to check whether it is valid
        if ($y == $mailColumn && !preg_match($pattern,$val))
        {$invaild[]=$row;}
//if valid, test whether it is duplicate
        elseif ($y == $mailColumn && in_array($val,$email))
        {$duplicate[]=$val;
        $duplicate[]=$row;}

        if ($y == $mailColumn)
        {$email[]=$val;
        $email=array_unique($email);}

      }
    }


// unique invalid array since i just need  invalid  inputs, not the invalid + duplicate input 
$invaild=array_unique($invaild);

Upvotes: 0

Views: 173

Answers (2)

Taha Paksu
Taha Paksu

Reputation: 15616

try this:

<?php     
echo "<pre>";    

$array1 = array("[email protected]","c","c","[email protected]","test1","","test3","test2","test3");    

$array_no_mistake = array_filter($array1,function($subject){if(trim($subject)=="") return true;});    

$array_uniq = array_diff(array_unique($array1),$array_no_mistake);    

$array_dups = array_diff_assoc(array_diff($array1,$array_no_mistake),$array_uniq);    

$array_valid = array_filter($array_uniq,function($subject){    
    if (preg_match('/\A(?:[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&\'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)\Z/i', $subject)) {    
    return true;    
} else {    
    return false;    
}    
});    


$array_invalid = array_diff_assoc($array_uniq,$array_valid);    


print_r($array1);    

print_r($array_no_mistake);    

print_r($array_uniq);    

print_r($array_dups);    

print_r($array_valid);    

print_r($array_invalid);    

?>

Upvotes: 1

Yaniro
Yaniro

Reputation: 1587

1) It would seem you are only interested in the email columns so i think there is no point in iterating over all of the other columns (so the inner loop is basically redundant).

2) You can use associative arrays in order to store emails as indexes and later on efficiently look for duplicates by checking for the existence of the index/email in the array.

Here's an example:

$valid   = array();
$invalid = array();
$dups    = array();

for ( $row = 0; $row < $highestRow; $row++ )
{
    $email = $sheet->getCellByColumnAndRow( $mailColumn, $row )->getValue();
    if ( !preg_match( $pattern, $email ) )
    {
        $invalid[] = $row;
    }
    else if ( isset( $dups[ $email ] ) )
    {
        $dups[ $email ][] = $row;
    }
    else
    {
        $dups[ $email ] = array();
        $valid[] = $row
    }
}

At the end of this, $invalid will hold a list of all of the invalid rows, $dups will hold an array of arrays, each indicating the rows in which the current email is the index and its value is an array which lists the rows that share this email. If the array at a certain index is empty, the email is not duplicated. $valid will hold the numbers of the valid rows. Now fancy algorithm, sorry...

Upvotes: 1

Related Questions