Felix Hirschberg
Felix Hirschberg

Reputation: 33

replace line breaks in CSV

I have an issue regarding converting a CSV-String into an array.

INV;165;1;0;1 Username;0;10000;"Here is multiline-text.

with line-breaks:

";20 Offen;0,00
INV;166;1;0;1 Username2;0;10000;"Here is another multiline-text.

with line-breaks:

";20 Offen;0,00

I tried to split up the fields with str_getcsv, but the problem is, that the delimiter only occurs in one field and the function is splitting up the multiline-fields also.

My solution was to first convert the line-breaks by preg_replace, but I'm not getting into it. Here's my regex to only replace the line-breaks enclosed by ;" and "; :

/(;")(.*)(\n)(.*)(";)/

This pattern does actually match only the first line-break. Could anyone give me a hint to do this job?

Thank you in advance.

Here is the original CSV:

CMXINV;165;1;0;1 Felix Hirschberg;0;10000;Herr;;Max;Muster;Company;;Street;123;City;DE;(0 40) 6 25 6;;(0 40) 6 25 6;[email protected];;;;;;;;0;20121217;20121217;1 Sofort ohne Abzug;EUR;1 Agentur;0 ;0,00;;"Vielen Dank für Ihren Auftrag.

Vereinbarungsgemäß berechnen wir Ihnen:

";"Mit besten Grüßen


Invoice Man";;0;0;0;0;;20 Offen;0,00;;0 ;0,00;0,00;;EXW;;;;;;;;;;;;;;;;2;;Project: Test-Project;;0,000;0,00;1,000;0,00;0,00;0;0;0;0;0
CMXINV;165;2;0;1 Felix Hirschberg;0;10000;Herr;;Max;Muster;Company;;Street;123;City;DE;(0 40) 6 25 6;;(0 40) 6 25 6;[email protected];;;;;;;;0;20121217;20121217;1 Sofort ohne Abzug;EUR;1 Agentur;0 ;0,00;;"Vielen Dank für Ihren Auftrag.

Vereinbarungsgemäß berechnen wir Ihnen:

";"Mit besten Grüßen


Invoice Man";;0;0;0;0;;20 Offen;0,00;;0 ;0,00;0,00;;EXW;;;;;;;;;;;;;;;;0;1;"- job1 (1h)
- job2 (1h)
- job3 (0,75h)
- job4 (1h)
- job5 (0,5h)";HR;3,25;100,00;1,00;0,00;325,00;1;0;0;0;0
MESSAGE;S;210053;INVOICE_GET hat 1 Datensätze zurückgegeben
MESSAGE;S;204020;Datenübertragung erfolgreich. Es wurden 1 Datensätze verarbeitet.

Upvotes: 3

Views: 3239

Answers (3)

Vyktor
Vyktor

Reputation: 21007

According to user comments in php manual both fgetcsv() and str_getcsv() should handle newlines correctly.

You probably should take an advantage of those implementation (they should have already solve any possible issue you can come accross).


Edit: own parser

Or you could write your own parser (based on comment):

// Browse file one character after another
while (false !== ($c = fgetc($fp))) {
    // We are not inside the value, newline = new row
    if( ($c == "\n") || ($c == "\r")){
       // Newline, add to result
       continue;
    }

    // Whitespace? continue, do nothing
    if( ctype_space( $c)){
        continue;
    }

    // Okay, now we can use switch
    switch( $c){
        case ',':
            // Add empty value
            break;

        // Escaped value
        case '"':
        case "'":
            $escapeChar = $c;
            $prevChar = '';
            $value = '';

            while( false !== ($c = fgetc($fp))){
                // We just hit and end of escaped sequence, check escaped val by \
                if( ($c == $escapeChar) && ($c != '\\') ){
                   break;
                }

                // If we got \ and prev value is \ = "blah blah \\"
                // Prevent escape escape character of being guessed incorrectly
                if( ($c == '\\') && ($prevChar == '\\')){
                    $prevChar = '';
                } else {
                    $prevChar = $c;
                }

                $value .= $c;
            }

            // $value is your value
            break;

        // Normal, non escaped value:
        default:
            $value = '';
            while( false !== ($c = fgetc($fp))){
                if( ($c == ',') || ($c == '\n') || ($c == '\r')){
                    break;
                }
                $value .= $c;
            }

            // $value = your field value
            break;
     }
}

Upvotes: 1

Ilmari Karonen
Ilmari Karonen

Reputation: 50378

If you have the CSV input in a file, you can just use fgetcsv(), which will handle multi-line entries just fine.

If the CSV input is in a string, you can use the special php://temp I/O stream to efficiently pass it to fgetcsv():

$fp = fopen( 'php://temp', 'w+' );
fputs( $fp, $csv );
rewind( $fp );
$data = fgetcsv( $fp, 0, ';', '"' );
fclose( $fp );

Upvotes: 0

Gareth Cornish
Gareth Cornish

Reputation: 4356

You could try this:

/;"(([^"]*)([\r\n])+([^"]*))+"/im

This will match the text before and after every newline within the ;" delimiters. The second match will be the preceding text, and the fourth match will be the following text.

Note that I have left off the last ';' to ensure that this will still match if the multi-line value is the last in the line.

Upvotes: 2

Related Questions