FMoridhara
FMoridhara

Reputation: 161

str_getcsv does not parse data properly when escape character is found exactly before enclosure

I need to parse csv string as array. I am using php str_getcsv() it works fine until I found below case.

$line = 'field1,field2,field3,"this is field having backslash at end\",anothersomeval';
$arrField = str_getcsv($line, ",", '"');
echo count($arrField);

So I expected count should be 5 but its showing 4 actually. I did google for this issue but could not find any proper solution. I doubt it is problem with str_getcsv() though I am not able to find any bug report on this.

I need to use proper csv parsing mechanism and I cannot split values based on field delimiter or just explode the string.

Any help on where I am going wrong with above code?

Upvotes: 2

Views: 3278

Answers (3)

Tama
Tama

Reputation: 311

First thing I had to say is that @user2395126 solution is good. In the comment I suggested to use a different way to define $csvIn, using quote and not double quote as string delimiter, to not escape every slash:

//CSV content with a backslash as last character
$csvIn = '"test 1", "test 2", "test 3\", "test 4"';

The rest of the solution should be that of @user2395126 .

Then I tried another way: just escaping backslashes before using str_getcsv(). At the end we get a partial result, containing an array with double slash instead of a single slash. A last step is needed.

Here my complete solution:

//CSV content with a backslash as last character
$csvIn = '"test 1", "test 2", "test 3\", "test 4"';

// Escape backslashes
$csvIn = str_replace("\\", "\\\\", $csvIn);

$csvArray = str_getcsv($csvIn, ',', '"');

//output partial result, with double slashes
print_r($csvArray);

//replace double slashes with single one
foreach($csvArray as $key => $item) {
  $csvArray[$key] = str_replace("\\\\", "\\", $item);
}

//output the clean results
print_r($csvArray);

Upvotes: 0

user2395126
user2395126

Reputation: 546

I had the same problem. I resolved it with this band-aid fix that seems to work just fine until there is an option added to the function of not using a delimiter character.

//messy CSV content
$csvIn = "\"test 1\", \"test 2\", \"test 3\\\", \"test 4\"";

//we will use the ASCII device control 1 character, this should not be in your CSV input
//to make sure it is not, replace all occurrences with an empty string
$csvIn = str_replace("\x11", "", $csvIn);

//convert the csv to array using str_getcsv function and our non-existent delimiter
//make sure the delimiter character is surrounded by double quotes, single quotes will not work
$csvArray = str_getcsv($csvIn, ',', '"', "\x11");

//output the clean results
print_r($csvArray);

Upvotes: 1

clapas
clapas

Reputation: 1846

The fourth argument to str_getcsv() sets the escape character; default escape character is backslash. In your case, you are escaping the doble quote.

If the backslash has no special meaning in your csv string and you want to treat it as a literal character, call str_getcsv() with a different escape character that you can assure won't be present in the csv string, e.g. '#', like:

$arrField = str_getcsv($line, ",", '"', '#');
echo count($arrField);
5

Upvotes: 3

Related Questions