Reputation: 51
Hello my dear saviors,
I opened a thread a few days ago asking a way to search/filter rows by a cell value with PHPExcel. A developer from this community saved my day (thanks again man!!!). I've been struggling since now with another question. ¿Is it possible to delete all duplicated rows with PHPExcel? To put you in situation here's my example table:
Want to show all rows having "I1/027" in his cell (DONE!), but without repeated rows:
Hours | Place | Name
------|-------|-----------------
3 |I1/027 | example1 //------> Want to add it to my list!!!
6 |I2/025 | example2 //------> Ignore this (no I1/027)
7 |I1/030 | example3 //------> Ignore this (no I1/027)
2 |I1/027 | example4 //------> Want to add it to my list!!!
3 |I1/027 | example1 //------> Don't want this row, it's repeated!!!
And the phpexcel code:
<?php if(isset($_FILES['file']['name'])) { ?>
<!-- Container progress bar -->
<div id="progress" style="width:500px;border:1px solid #ccc;"></div>
<!-- progress info -->
<div id="information" style="width"></div>
<?php require_once 'reader/Classes/PHPExcel/IOFactory.php';
//Extra functions
function get_cell($cell, $objPHPExcel){
//Cell selection
$objCell = ($objPHPExcel->getActiveSheet()->getCell($cell));
//taking cell value
return $objCell->getvalue();
}
function pp(&$var){
$var = chr(ord($var)+1);
return true;
}
//==========Displaying Code
$name = $_FILES['file']['name'];
$tname = $_FILES['file']['tmp_name'];
$type = $_FILES['file']['type'];
if($type == 'application/vnd.ms-excel')
{ // excel 97 extension
$ext = 'xls';
}
else if($type == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
{ // excel 2007 and 2010 extensions
$ext = 'xlsx';
}else
{ // invalid extension
echo -1;
exit();
}
$xlsx = 'Excel2007';
$xls = 'Excel5';
//read creator
$objReader = PHPExcel_IOFactory::createReader($$ext);
//loading
$objPHPExcel = $objReader->load($tname);
$dim = $objPHPExcel->getActiveSheet()->calculateWorksheetDimension();
// put $start and $end array
list($start, $end) = explode(':', $dim);
if(!preg_match('#([A-Z]+)([0-9]+)#', $start, $rslt)){
return false;
}
list($start, $start_h, $start_v) = $rslt;
if(!preg_match('#([A-Z]+)([0-9]+)#', $end, $rslt)){
return false;
}
list($end, $end_h, $end_v) = $rslt;
//starting to read excel doc
$table = "<table class='tabla'>";
for($v=$start_v; $v<=$end_v; $v++){
// calculate progress bar
$percent = intval($v/$end_v * 100)."%";
// progress bar update
echo '<script language="javascript">
document.getElementById("progress").innerHTML="<div style=\"width:'.$percent.';background-color:#ddd;\"> '.$percent.'</div>";
document.getElementById("information").innerHTML="'.$v.' files processed.";</script>';
// buffer flush
echo str_repeat(' ',1024*64);
// send exit to navigator
flush();
sleep(0.25);
//horizontal reading
$tempRow= "<tr>";
$contentFound=false;
for($h=$start_h; ord($h)<=ord($end_h); pp($h)){
$cellValue = get_cell($h.$v, $objPHPExcel);
$tempRow.= "<td>";
if($cellValue !== null){
if($cellValue=="I1/027") $contentFound=true;
$tempRow.= $cellValue;
}
$tempRow.= "</td>";
}
$tempRow.= "</tr>";
if($contentFound) $table.=$tempRow;
}
// process completed
echo '<script language="javascript">document.getElementById("information").innerHTML="Process completed"</script><br>';
echo $table;
}?>
Found this function I'm trying to exploit, but without luck:
function removeDuplicates($inputFileName, $objPHPExcel) {
$worksheet = $objPHPExcel->getActiveSheet();
$urn = array();
foreach ($worksheet->getRowIterator() as $row) {
$rowIndex = $row->getRowIndex();
$cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
array_push($urn, $cellValue);
}
$numberOfURNs = count($urn);
for ($rowIndex = $numberOfURNs; $rowIndex != 1; $rowIndex--) {
$cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
for ($i = $rowIndex - 2; $i != 0; $i--) {
if ($urn[$i] == $cellValue) {
$worksheet->removeRow($rowIndex);
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
$objWriter->save($inputFileName);
break;
}
}
}
return $objPHPExcel = checkExtension($inputFileName);
}
Thanks a lot in advance!!
Upvotes: 1
Views: 1948
Reputation: 37
$sheets = $spreadsheet->getAllSheets();
foreach ($sheets as $sheet_index => $sheet) {
$rows = $sheet->toArray();
$rows = array_map("unserialize", array_unique(array_map("serialize", $rows)));
$sheet->fromArray($rows);
}
For everyone still looking for an easy solution, this will clean a whole file. Off course you can select the sheet you wan't to remove duplicates on.
Upvotes: 0
Reputation: 2641
There are multiple solutions. All of them have some disadvantages. You could compare all rows against all rows in the excel file which would mean, that you need to traverse the whole file for each row. If you have 1000 rows, that is 1000 * 1000 comparisons. The runtime is O(n^2) if you are familiar with this notation which basically means that it will become slow as hell if your number of rows increases.
You could read all of the values into an array of arrays and let PHP do the hard stuff using array_unique. An example of using it with multidimensional arrays is presented in the PHP docs. It works by serializing the inner array, then using array_unqie and deserializing again. I'm not aware of how the array functions in PHP are implemented, but serializing / deserializing probably takes a lot of time. Additionally all the values are present in memory, which could become a problem if your Excel files are quite big.
The third possibilty I can image is inserting all data into a database, then let the database do the duplicate checking by using the distinct keyword. So just import it to an import table and then use a insert from select statement to insert it in the correct table without duplicates. Afterwards delete the contents of the import table again. I think this would be my prefered solution if the data has to be inserted into the database anyway
Upvotes: 3