Marc B
Marc B

Reputation: 31

PHP- Compare two CSV files, look for duplicates and remove matching rows from one of the files

I'm trying my best to learn PHP and hack things out myself. But this part has me stuck.

I have two CSV files with hundreds of rows each.

CSV 1 looks like this:

name, email, interest

CSV 2 looks like this:

email only

I'm trying to write a script to compare the two files looking for duplicates. I only want to keep the duplicates. But as you can see, CSV 2 only contains an email. If an email in CSV 1 DOES NOT EXIST in CSV 2, then the row containing that email in CSV 1 should be deleted.

The end result can either overwrite CSV 1 or create a fresh new file called "final.csv"... whatever is easiest.

I would be grateful for the help.

I tried something along these lines with no luck:

egrep -v $(cat csv2.csv | tr '\n' '|' | sed 's/.$//') csv1.csv

and

grep -v -f csv22.csv csv1.csv >output-file

cheers,

marc

Upvotes: 1

Views: 4705

Answers (2)

Marc B
Marc B

Reputation: 31

Solved! The problem was with Mac line breaks. Look at the code below to see the additions at the beginning and end of the code to fix that problem. Thank you Crayon Violent for all of your help!

ini_set('auto_detect_line_endings',TRUE);
if (($file3 = fopen("output.csv", "w")) !== FALSE) {
  if (($file1 = fopen("dirty.csv", "r")) !== FALSE) {
    while (($file1Row = fgetcsv($file1)) !== FALSE) {
      if (($file2 = fopen("clean.csv", "r")) !== FALSE) {
        while (($file2Row = fgetcsv($file2)) !== FALSE) {
          if ( strtolower(trim($file2Row[0])) == strtolower(trim($file1Row[1])) )
            fputcsv($file3, $file1Row);             
        }
        fclose($file2);
      }
    }
    fclose($file1);
  }
  fclose($file3);
}
ini_set('auto_detect_line_endings',FALSE);

Upvotes: 0

CrayonViolent
CrayonViolent

Reputation: 32532

Here is a script that will loop through both files and output a 3rd file where email addresses in file2 are found in file1.

if (($file3 = fopen("file3.csv", "w")) !== FALSE) {
  if (($file1 = fopen("file1.csv", "r")) !== FALSE) {
    while (($file1Row = fgetcsv($file1)) !== FALSE) {
      if (($file2 = fopen("file2.csv", "r")) !== FALSE) {
        while (($file2Row = fgetcsv($file2)) !== FALSE) {
          if ( strtolower(trim($file2Row[0])) == strtolower(trim($file1Row[1])) )
            fputcsv($file3, $file1Row);             
        }
        fclose($file2);
      }
    }
    fclose($file1);
  }
  fclose($file3);
}

Couple of notes:

  • You may need to provide some additional arguments to fgetcsv, depending on how your csv is structured (e.g. delimiter, quotes)
  • Based on how you listed the contents of each file, this code reads the 2nd column of file1, and the 1st column of file2. If that's not really how they are positioned, you will need to change the number in the bracket for $file1Row[1] and $file2Row[0]. Column # starts at 0.
  • Script is current set to overwrite if file3.csv exists. If you want it to append instead of overwrite, change the 2nd argument of the $file3 fopen to "a" instead of "w"

Example:

file1.csv:

john,[email protected],blah
mary,[email protected],something
jane,[email protected],blarg
bob,[email protected],asdfsfd

file2.csv

[email protected]
[email protected]

file3.csv (generated)

mary,[email protected],something
bob,[email protected],asdfsfd

Upvotes: 2

Related Questions