Reputation: 4018
I noticed that str_getcsv
doesn't seem to enclose the first value it receives in double quotation marks, even when the string data is passed this way.
In the example below, the first value in the 3rd row is "Small Box, But Smaller"
, but after running it through str_getcsv
it becomes Small Box, But Smaller
(without double quotation marks). Like this:
// multi-line csv string
$csvString = <<<'CSV'
"Title","Description",Quantity
"Small Box","For storing magic beans.",2
"Small Box, But Smaller","Not sure why we need this.",0
CSV;
// split string into rows (don't use explode in case multi-line values exist)
$csvRows = str_getcsv($csvString, "\n"); // parse rows
echo '<pre>';
print_r($csvRows);
echo '</pre>';
Outputs:
Array
(
[0] => Title,"Description",Quantity
[1] => Small Box,"For storing magic beans.",2
[2] => Small Box, But Smaller,"Not sure why we need this.",0
)
The problem this causes is that now if each row is parsed using str_getcsv
, a comma in the first value makes it split into two rows. If it keeps running this:
foreach($csvRows as &$csvRow) {
$csvRow = str_getcsv($csvRow); // parse each row into values and save over original array value
}
unset($csvRow); // clean up
// output
echo '<pre>';
print_r($csvRows);
echo '</pre>';
Outputs:
Array
(
[0] => Array
(
[0] => Title
[1] => Description
[2] => Quantity
)
[1] => Array
(
[0] => Small Box
[1] => For storing magic beans.
[2] => 2
)
[2] => Array
(
[0] => Small Box
[1] => But Smaller
[2] => Not sure why we need this.
[3] => 0
)
)
The problem is in the last array value, which is an array of 4 keys instead of 3. It's split on the comma of the value "Small Box, But Smaller"
.
On the other hand, parsing just one row string works:
$csvRowData = '"Small Box, But Smaller","Not sure why we need this.",0';
$csvValues = str_getcsv($csvRowData);
echo '<pre>';
print_r($csvValues);
echo '</pre>';
Outputs:
Array
(
[0] => Small Box, But Smaller
[1] => Not sure why we need this.
[2] => 0
)
Why is this happening and how do I solve the problem with multi-line CSV data? Is there a best practice for working with multi-line CSV data when it is a string and is not read directly from a file? Also, I need to handle multi-line values, such as "foo \n bar"
so I can't just use explode()
instead of the first str_getcsv()
.
Upvotes: 4
Views: 2665
Reputation: 659
I don't know why you PHP_EOL is not working correctly as it does on my server however I did encounter this problem before.
The approach I took goes as follows.
Firstly I like to make sure all my fields are surrounded by double quotes regardless of the value in the field so to use your example text (with some slight modifications):
// multi-line csv string
$csvString = <<<CSV
"Title","Description","Quantity"
"Small Box","For storing magic beans.","2"
"Small Box, But Smaller","Not sure why we need this.","0"
"a","\n","b","c"
CSV;
$csvString .= '"a","' . "\n" . '","' . PHP_EOL . '","c"';
Secondly I target solo PHP_EOL that may be lingering in values so I can replace any "PHP_EOL" strings with "\r\n"
// Clear any solo end of line characters that are within values
$csvString = str_replace('","' . PHP_EOL . '"', '",""',$csvString);
$csvString = str_replace('"' . PHP_EOL . '","', '"","',$csvString);
$csvString = str_replace('"' . PHP_EOL . '"', '"'. "\r\n" . '"',$csvString);
and then finally this allows me to use the php explode function and display output:
$csvArr = explode("\r\n",$csvString);
foreach($csvArr as &$csvRow) {
$csvRow = str_getcsv($csvRow); // parse each row into values and save over original array value
}
unset($csvRow); // clean up
// output
echo '<pre>';
print_r($csvArr);
echo '</pre>';
Which outputs:
Array
(
[0] => Array
(
[0] => Title
[1] => Description
[2] => Quantity
)
[1] => Array
(
[0] => Small Box
[1] => For storing magic beans.
[2] => 2
)
[2] => Array
(
[0] => Small Box, But Smaller
[1] => Not sure why we need this.
[2] => 0
)
[3] => Array
(
[0] => a
[1] =>
[2] => b
[3] => c
)
[4] => Array
(
[0] => a
[1] =>
[2] =>
[3] => c
)
)
As you can see from the output the new line characters are not targeted, just the PHP_EOL.
Upvotes: 0
Reputation: 4018
After much headache I think I understand the problem now. According to the PHP folks, "str_getcsv() is designed to parse a single CSV record into fields" (see https://bugs.php.net/bug.php?id=55763). I discovered that using str_getcsv()
for multiple rows causes these not-so-well documented problems:
I solved the issue by creating a temporary file and writing the CSV content to it. Then I read the file using fgetcsv()
, which did not result in the 2 issues I described above. Example code:
// multi-line csv string
$csvString = <<<'CSV'
"Title","Description",Quantity
"Small Box","For storing magic beans.",2
"Small Box, But Smaller","This value
contains
multiple
lines.",0
CSV;
// ^ notice the multiple lines in the last row's value
// create a temporary file
$tempFile = tmpfile();
// write the CSV to the file
fwrite($tempFile, $csvString);
// go to first character
fseek($tempFile, 0);
// track CSV rows
$csvRows = array();
// read the CSV temp file line by line
while (($csvColumns = fgetcsv($tempFile)) !== false) {
$csvRows[] = $csvColumns; // push columns to array (really it would be more memory-efficient to process the data here and not append to an array)
}
// Close and delete the temp file
fclose($tempFile);
// output
echo '<pre>';
print_r($csvRows);
echo '</pre>';
Results in:
Array
(
[0] => Array
(
[0] => Title
[1] => Description
[2] => Quantity
)
[1] => Array
(
[0] => Small Box
[1] => For storing magic beans.
[2] => 2
)
[2] => Array
(
[0] => Small Box, But Smaller
[1] => This value
contains
multiple
lines.
[2] => 0
)
)
I'll also add that I found some options on GitHub, and 2 major projects for PHP 5.4+ and PHP 5.5+. However, I am still using PHP 5.3 and only saw options with limited activity. Furthermore, some of those processed CSV strings by writing to files and reading them out also.
I should also note that the documentation for PHP has some comments about str_getcsv()
not being RFC-compliant: http://php.net/manual/en/function.str-getcsv.php. The same seems to be true for fgetcsv()
yet the latter did meet my needs, at least in this case.
Upvotes: 4