Removing lines from CSV using PHP where there are more than one similar ID in a column

Question

I have ran into a problem with CSV and PHP and I have limited experience with CSVs in general. I have a CSV file with data:

name;id;date
john;222;17.07.2018
john;222;29.10.2018
mary;333;01.11.2018
mary;333;02.11.2018
steve;444;05.11.2018
kyle;555;06.11.2018
.
.

I would like to remove (or skip) all of the lines, where ID has more than one occurrence.

That means that all of the lines with IDs 222, 333 would be removed and lines with IDs 444, 555 (because only one occurrence of ID 444 and 555 exist in column) would be written into a new file.

My goal would be following:

name;id;date
steve;444;05.11.2018
kyle;555;06.11.2018
.
.

I can write to file and load csv to PHP myself, but I'm struggling to find a correct way to filter them like I mentioned above

I googled and tried many StackOverFlow examples, but these are only filtering unique lines (displaying IDs 222,333,444,555) and not filtering lines where IDs only occurs once in column.

Thank you!

Assad Ullah Ch · Accepted Answer

You can use 2D array to get it done. You can read the file line by line, explode every line and only pick the second substring (id) from each line and use it as index name of the external array and keep storing the values on internal array on the same index.

Your 2D array dump should look like this:

Array
(
    [id] => Array
        (
            [0] => name;id;date
        )

    [222] => Array
        (
            [0] => john;222;17.07.2018
            [1] => john;222;29.10.2018
        )

    [333] => Array
        (
            [0] => mary;333;01.11.2018
            [1] => mary;333;02.11.2018
        )

    [444] => Array
        (
            [0] => steve;444;05.11.2018
        )

    [555] => Array
        (
            [0] => kyle;555;06.11.2018
        )

)

Next, you need to loop through this array and only pick array on every index with count 1.

Here's the complete code:

// array to hold lines 
$lines = [];

// string type variale to hold final result
$contents = '';


// open the csv file
if (($handle = fopen("test.csv", "r")) !== false) {
    // read each line into an array
    while (($data = fgetcsv($handle, 8192, ",")) !== false) {

        // explode the string on semicolons
        $segment = explode( ';', $data[ 0 ] );

        // pick the second substring (the id) and use it as index in the $lines array and assign the read line to it
        $lines[ $segment[ 1 ] ][] = $data[ 0 ];

    }
    fclose($handle);
}



foreach( $lines as $line ){
    // only pick the inner arrays with one element to remove all double records.
    if( count( $line ) == 1 ){
        $contents .= $line[0] . "
";
    }
}

file_put_contents("unique_file.csv", $contents);

Let me know how it went.

Best of luck ;)

Remember! This solution is OK if you got a small .csv file. For larger files, there are other solutions.

Removing lines from CSV using PHP where there are more than one similar ID in a column

Answers (1)

Related Questions