Reputation: 51
I have CSV file that looks like this:
account, name, email,
123, John, [email protected]
123, John, [email protected]
1234, Alex, [email protected]
I need to remove duplicate rows.I try to do it like this:
$inputHandle = fopen($inputfile, "r");
$csv = fgetcsv($inputHandle, 1000, ",");
$accounts_unique = array();
$accounts_unique = array_unique($csv);
print("<pre>".print_r($accounts_unique, true)."</pre>");
But I get in print_r only first headers row. What needs to be done in order to make sure I 1. I clean the CSV file from duplicate rows 2. I can make some list of those duplicates (maybe store them in another CSV?)
Upvotes: 1
Views: 4081
Reputation: 41
If you are going to loop the data from the CSV anyway I think it would be best to do something like this.
$dataset = array();
foreach($line as $data){
$dataset[sha1($data)] = $data;
}
Upvotes: 1
Reputation: 15464
Simple solution, but it requires a lot of memory if file is really big.
$lines = file('csv.csv');
$lines = array_unique($lines);
file_put_contents(implode(PHP_EOL, $lines));
Upvotes: 4
Reputation: 1703
I would go this route, which will be faster than array_unique:
$inputHandle = fopen($inputfile, "r");
$csv = trim(fgetcsv($inputHandle, 1000, ","));
$data = array_flip(array_flip($csv)); //removes duplicates that are the same
$dropped = array_diff_key($csv, $data); //Get removed items.
Note -- array_unique()
and array_flip(array_flip())
will only match for duplicate lines that are exactly the same.
Updated to include information from my comments.
Upvotes: 1