Reputation: 581
Ok, I have a csv file like this:
14 ; 1234,56 ; 10203 ; "ABC" ; "DFG" ; "Lorem \n ipsum \n dolor sit" \n
15 ; 234,16 ; 10204 ; "ABC" ; "DFG" ; "Lorem \n ipsum \n dolor sit" \n
16 ; 1234,15 ; 10304 ; "CCC" ; "DFG" ; "Lorem ipsum/dolor \n sit amet\consec" \n
and so on...
The file has almost 550000 lines. How do I replace all \n characters inside double quotes at once?
I'm using PHP 5. Could it be done by preg_replace()?
Upvotes: 0
Views: 4233
Reputation: 834
I don't know if you're using fgetcsv(), but you can configure it to recognize individual fields including quoted information.
This way you can read your lines in one at a time and strip the new line characters at the field level rather than having to do an expensive RegEx operation on a large file all at once.
Slightly modified php code example from the documentation (replaced delimiter with ';'):
$row = 1;
$handle = fopen("data.txt", "r");
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
$num = count($data);
echo "<p> $num fields in line $row: <br /></p>\n";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "<br />\n";
}
}
fclose($handle);
data.txt
14 ; 1234,56 ; 10203 ; "ABC" ; "DFG" ; "text
text
more text"
15 ; 234,16 ; 10204 ; "ABC" ; "DFG" ; "text
text
more text"
This will be recognized as 2 lines instead of 6 because fgetcsv() will recognize the new line characters in the quotes as part of the field and not additional lines of data.
Upvotes: 3
Reputation: 778
I'm not too well versed in extremely complex regex's, so assuming you're looking for a one time conversion I would write a quick script to open the csv in php, read the file (fgetcsv built into php5) and write (fputcsv) line by line into a new file while str_replace'ing the newline characters.
(If I wasn't looking for the monster regex on stackoverflow, that is.)
Upvotes: 0
Reputation: 545985
So do you actually have the string '\n'
(not a new line character) on some lines? If so, you just need to escape the new line character:
str_replace("\\n", "*foo*", $csv)
// this will make the following change:
14 ; 1234,56 ; 10203 ; "ABC" ; "DFG" ; "text \n text \n more text" \n
// that to this:
14 ; 1234,56 ; 10203 ; "ABC" ; "DFG" ; "text *foo* text *foo* more text" \n
Upvotes: 0