Andi S.
Andi S.

Reputation: 339

Regex for wrong quoted CSV-File

I have problem with some corrupt csv-File. I get it like that:

column1,column2,column3,column4,column5,column6
123,"some text",""column3 text"",""still column3 text"",4,234,""
123,"some text",""column3 text"",4,234,""

In table it should look like that:

column1 | column2   | column3                            | column4 | column5 | column6
123     | some text | "column3 text, still column3 text" | 4       | 234     | 
123     | some text | "column3 text"                     | 4       | 234     |

I am reading the file with php and tried to use it with str_getcsv into array. But because of this broken quotes it won't work and always have more columns than titles.

At all I don't need value of column 3 so I tried to do some regex to make three groups and do preg_replace then. But I don't get a regex that works for both lines.

With this regex I get just first line: https://regex101.com/r/OjTAAC/1

and with this I get just second line: https://regex101.com/r/I2xqPs/1

Anybody has some help how to get a regex that works for both situations?

Upvotes: 3

Views: 61

Answers (1)

ArtisticPhoenix
ArtisticPhoenix

Reputation: 21661

There may be simpler solution, I would backup or have a copy of the file though and you may have to do it different it it's large.

Lets try something different

//$str = '123,"some text",""column3 text"",""still column3 text"",4,234,""';
//$str = '123,"some text",""column3 text"",4,234,""'

while (($str = fgets($handle, 4096)) !== false) {      
     $str = str_replace('"', '', $str);
    $line = explode(',',$str);

    //combine line item 2,3
    if(count($line) == 7 ){
        $line[2] .= ', '.$line[3];
        //remove item 3
        unset($line[3]);
        $line = array_values($line);
    } 
    print_r( $line );
}

As long as the lines are consistent with what you show it should work.

$array =[
    '123,"some text",""column3 text"",""still column3 text"",4,234,""',
    '123,"some text",""column3 text"",4,234,""'
];

foreach($array as $str){
    $str = str_replace('"', '', $str);
    $line = explode(',',$str);

    //combine line item 2,3
    if(count($line) == 7 ){
        $line[2] .= ', '.$line[3];
        //remove item 3
        unset($line[3]);
        $line = array_values($line);
    } 
    print_r( $line );
}

Outputs

Array
(
    [0] => 123
    [1] => some text
    [2] => column3 text, still column3 text
    [3] => 4
    [4] => 234
    [5] =>
)
Array
(
    [0] => 123
    [1] => some text
    [2] => column3 text
    [3] => 4
    [4] => 234
    [5] =>
)

You can test it here.

http://sandbox.onlinephpfunctions.com/code/f39eb94ccef045213a30385cc7daa326ce3aa25d

Upvotes: 1

Related Questions