David Bélanger
David Bélanger

Reputation: 7438

Regex return expected result plus line break on another line

Before I start, I know this is CSV and I know there is a function that exist build-in PHP. I got the following pattern :

preg_match_all("/([^\"]|\"[^\"]*\")*?(r\n|\n\r|\r|\n)/i", $CSV, $Matches);

Who will parse something like that :

Country,Region/State,City,"Zip/Postal Code\n From","Zip/Postal Code To","Weight From","Weight To","Shipping Price","Delivery Type"\n\r
CAN,*,,,,0.0000,4999.0000,29.7500,Priority\n\r
CAN,*,,,,10000.0000,19999.0000,35.5000,Express\n\r
CAN,*,,,,0.0000,4999.0000,19.7500,Express\n\r
CAN,*,,,,20000.0000,99999999.9999,59.0000,Priority\n\r
CAN,*,,,,5000.0000,9999.0000,34.7500,Priority\n\r
CAN,*,,,,20000.0000,99999999.9999,41.5000,Express\n\r
CAN,*,,,,5000.0000,9999.0000,24.4500,Express\n\r
CAN,*,,,,10000.0000,19999.0000,48.0000,Priority\n\r
CAN,*,,,,10000.0000,19999.0000,29.7500,Standard\n\r
CAN,*,,,,20000.0000,99999999.9999,36.5000,Standard\n\r
CAN,*,,,,500.0000,9999.0000,20.3500,Standard\n\r
CAN,*,,,,90.0000,499.0000,9.7500,Standard\n\r
CAN,*,,,,50.0000,89.0000,1.8000,Standard\n\r
CAN,*,,,,30.0000,49.0000,1.5000,Standard\n\r
CAN,*,,,,0.0000,29.0000,1.0000,Standard\n\r
USA,*,,,,20000.0000,99999999.9999,160.0000,Express\n\r
USA,*,,,,10000.0000,14999.0000,76.0000,Express\n\r
USA,*,,,,1000.0000,4999.0000,42.0000,Express\n\r
USA,*,,,,15000.0000,19999.0000,155.0000,Priority\n\r
USA,*,,,,5000.0000,9999.0000,94.0000,Priority\n\r
USA,*,,,,0.0000,999.0000,75.5000,Priority\n\r
USA,*,,,,15000.0000,19999.0000,98.0000,Express\n\r
USA,*,,,,5000.0000,9999.0000,61.5000,Express\n\r
USA,*,,,,0.0000,999.0000,40.0000,Express\n\r
USA,*,,,,20000.0000,99999999.9999,230.0000,Priority\n\r
USA,*,,,,10000.0000,14999.0000,120.0000,Priority\n\r
USA,*,,,,1000.0000,4999.0000,61.5000,Priority\n\r
USA,*,,,,500.0000,999.0000,25.5000,Standard\n\r
USA,*,,,,90.0000,499.0000,13.3500,Standard\n\r
USA,*,,,,50.0000,89.0000,3.0000,Standard\n\r
USA,*,,,,30.0000,49.0000,1.8000,Standard\n\r
USA,*,,,,0.0000,29.0000,1.5000,Standard\n\r

The resulst I get is similar to :

[2] => Array
    (
    )

[3] => Array
    (
        [0] => CAN
        [1] => *
        [2] => 
        [3] => 
        [4] => 
        [5] => 10000.0000
        [6] => 19999.0000
        [7] => 35.5000
    )

[4] => Array
    (
    )

[5] => Array
    (
        [0] => CAN
        [1] => *
        [2] => 
        [3] => 
        [4] => 
        [5] => 0.0000
        [6] => 4999.0000
        [7] => 19.7500
    )

[6] => Array
    (
    )

If I try to add ?: in the line break group it still do it. Can anyone help me, I am stuck there. Thanks.

Upvotes: 1

Views: 120

Answers (1)

Carl Walsh
Carl Walsh

Reputation: 6959

Not knowing any particulars of php matching, I'll take your word that the regex is working like you show it is (using my preferred regex I'm not capturing in the same way).

I'll assume you are trying to remove those blank matches. I'll also believe that those "newlines" are actually encoded into the input, and not left as literal \'s and \r's and \n's.

The problem seems to be the "newlines" are being matched twice? Like you match just the \n on one pass, and then the \r on the next pass?

The simplest solution would be to restrict the newline to be the type you know the file has: /([^\"]|\"[^\"]*\")*?(\n\r)/ Does this help?

Alternatively, I would just use a regex split (delimited by comma) on each line of input.

Upvotes: 1

Related Questions