Alex
Alex

Reputation: 51

How Can I Remove Duplicate Rows from CSV file with PHP

I have CSV file that looks like this:

account, name, email,
123, John, [email protected]
123, John, [email protected]
1234, Alex, [email protected]

I need to remove duplicate rows.I try to do it like this:

$inputHandle = fopen($inputfile, "r");
$csv = fgetcsv($inputHandle, 1000, ",");

$accounts_unique = array();

$accounts_unique = array_unique($csv);  

print("<pre>".print_r($accounts_unique, true)."</pre>");

But I get in print_r only first headers row. What needs to be done in order to make sure I 1. I clean the CSV file from duplicate rows 2. I can make some list of those duplicates (maybe store them in another CSV?)

Upvotes: 1

Views: 4081

Answers (3)

user3718742
user3718742

Reputation: 41

If you are going to loop the data from the CSV anyway I think it would be best to do something like this.

$dataset = array();
foreach($line as $data){
    $dataset[sha1($data)] = $data;
}

Upvotes: 1

sectus
sectus

Reputation: 15464

Simple solution, but it requires a lot of memory if file is really big.

$lines = file('csv.csv');
$lines = array_unique($lines);
file_put_contents(implode(PHP_EOL, $lines));

Upvotes: 4

Jacob S
Jacob S

Reputation: 1703

I would go this route, which will be faster than array_unique:

$inputHandle = fopen($inputfile, "r");
$csv = trim(fgetcsv($inputHandle, 1000, ","));
$data = array_flip(array_flip($csv)); //removes duplicates that are the same
$dropped = array_diff_key($csv, $data); //Get removed items.

Note -- array_unique() and array_flip(array_flip()) will only match for duplicate lines that are exactly the same.

Updated to include information from my comments.

Upvotes: 1

Related Questions