Robin Bantjes
Robin Bantjes

Reputation: 299

Parsing unescaped double quotes with fgetcsv

I'm importing a CSV file into a MySQL database. To parse the CSV I'm using fgetcsv(). The CSV contains " characeters that have not been escaped and causes and error array_combine(): Both parameters should have an equal number of elements

The CSV data is in this format:

"GR109     "," ",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0," ","GRANT     ","M          ","W",0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0.0,0.0," ",0,0,0.0," "," "," ",2.42,0.0,0.0,0.0,0.0," "," "," "," "," "," ",0.0,0.0,0.0,0.0,0.0," "," "," "," ","SELF COL  ","16 P PR.  "," ","PLAIN     "," ","R/E1ROW   "," "," "," "," "," "," "," ","R/E1ROW   ","BEADED    "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," "," ","GRANT     ","GRANT     "," "," ","  "," ",0.0,"  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","  ","VAMP LNG - BLK. CARAVELLE P/S. QTR. LNG./ TNG.LINING - BLK. TORINO. (GREY ""TORINO"" FOR LIZARD.)","GR109 COMPLETE"

My code:

function csv_to_array($filename='', $delimiter=',', $enclosure='"', $escapestring='"')
{
if(!file_exists($filename) || !is_readable($filename))
    return FALSE;

$header = NULL;
$data = array();
if (($handle = fopen($filename, 'r')) !== FALSE)
{
    while (($row = fgetcsv($handle, 1000, $delimiter, $enclosure, $escapestring)) !== FALSE)
    {
        if(!$header)
            $header = $row;
        else
            $data[] = array_combine($header, $row);
    }
    fclose($handle);
}
return $data;

}

I added $escapestring='"', but that didn't help. Is the error from ""TORINO"" where the " character is not escaped? If so, is there a way to handle these fields?

Upvotes: 2

Views: 6485

Answers (3)

Deus777
Deus777

Reputation: 1836

Ok. I think I found it. Definitely second parameter in fgetcsv, which is the length of the line, messed up a little. Just change it to 0 (function could work quite slower), or double it. When it was 1000 it was cutting each row into two rows, one with length exactly 1000 chars (cutting was done even in the middle of the world) and second row being the rest of the line up to newline character. So $row variable at first was an array with 117 length, then about 13, again 117 and 13.

Just change this:

while (($row = fgetcsv($handle, 1000, $delimiter, $enclosure, $escapestring)) !== FALSE)

to this:

while (($row = fgetcsv($handle, 0, $delimiter, $enclosure, $escapestring)) !== FALSE)

Upvotes: 0

Pedro Lobito
Pedro Lobito

Reputation: 98861

$result = preg_replace('/"((?=[^"]*)(?=(?=[^"]*)))"/', '$1', $subject);

The regex above will remove double-quotes inside double-quotes. It will for work with:

  1. "some text "inside quotes" more text"

  2. ""inside quotes" more text"



Ideone demo

Upvotes: 0

arkascha
arkascha

Reputation: 42885

Your code works perfectly fine for me. In the example below I removed the header handling, since your example data does not contain any header.

<?php

function csv_to_array($filename='', $delimiter=',', $enclosure='"', $escapestring='"')
{
    if(!file_exists($filename) || !is_readable($filename))
        return FALSE;

    $data = array();
    if (($handle = fopen($filename, 'r')) !== FALSE) {
        while (($row = fgetcsv($handle, 1000, $delimiter, $enclosure, $escapestring)) !== FALSE) {
            $data[] = $row;
        }
        fclose($handle);
    }
    return $data;
}

$result = csv_to_array('test.csv');
foreach ($result as $key=>$element) {
    echo $key . ' => ' . print_r($element,true) . "\n";
}

If the file test.csv contains your given csv string, then the obvious output is (shortened):

0 => Array
(
    [0] => GR109     
    [1] =>  
    [2] => 0.0
    [3] => 0.0
    // .....
    [127] =>
    [128] => VAMP LNG - BLK. CARAVELLE P/S. QTR. LNG./ TNG.LINING - BLK. TORINO. (GREY "TORINO" FOR LIZARD.)
    [129] => GR109 COMPLETE
)

Upvotes: 1

Related Questions