Doug McK
Doug McK

Reputation: 468

Split string by newlines

This question is very similar to use preg_split instead of split but I've got some confusions with the regex that I'd live to clear up.

Trying to update some existing split() functions to use preg_split() instead and I'm getting some unclear results. Running the code below will give me arrays of different lengths and I'm not sure why.

From what I can see split is matching on \n with a possible \r beforehand. And I think preg_split() is doing the same but then why is it creating 2 splits? Is this to do with lazy/greedy matching?

Demo code :

$test = "\r\n";

$val = split('\r?\n', $test); //literal interpretation of string
$val_new = split("\r?\n", $test); //php understanding that these are EOL chars
$val2 = preg_split('/\r?\n/', $test);

var_dump($val); // returns array(1) { [0]=> string(2) " " }
var_dump($val2); // returns array(2) { [0]=> string(0) "" [1]=> string(0) "" }

Edit : added in $val_new based on Kolinks comments because they helped clear up my understanding of the problem so may be of use to another too

Upvotes: 1

Views: 1093

Answers (2)

anubhava
anubhava

Reputation: 785126

You should PREG_SPLIT_NO_EMPTY flag as 3rd argument of preg_split to ignore empty tokens in the split array. So if you use

preg_split('/\r?\n/', $test, PREG_SPLIT_NO_EMPTY);

then it will behave same as split function.

And by the way your use of \r?\n in split function is not doing any splitting (since split doesn't understand \r and \n in single quotes) and returning your original string back.

Edit: Alternatively you can use split with double quotes regex:

split("\r?\n", $test);

to split your string into 2 elements array.

Upvotes: 2

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324640

split does not understand \r and \n as special characters, and because you used single quotes PHP doesn't treat them as special characters either. So split is looking for literal \\n or \r\n.

preg_split, on the other hand, does understand \r and \n as special characters, so even though PHP doesn't treat them as such PCRE does and the string is therefore split correctly.

This has nothing to do with lazy/greedy matching, it's all because of the single quotes not parsing \r\n into their newline meanings.

Upvotes: 1

Related Questions