Reputation: 29
Specifically regarding PHP preg_split here, why is this valid:
$words = preg_split("/[\/\s,_-]+/", $string);
Where the below returns "preg_split() [function.preg-split]: Compilation failed: range out of order in character class at offset 7":
$words = preg_split("/[\s,_-\/]+/", $string);
Note the only difference is the placement of the forwardslash within the regex.
Upvotes: 1
Views: 92
Reputation: 627103
The problem with $words = preg_split("/[\s,_-\/]+/", $string);
is that -
is indicating an invalid range here.
The minus (hyphen) character can be used to specify a range of characters in a character class. For example, [d-m] matches any letter between d and m, inclusive. If a minus character is required in a class, it must be escaped with a backslash or appear in a position where it cannot be interpreted as indicating a range, typically as the first or last character in the class.
There would be no compilation error if the range was valid, i.e. starting with a character with lower index up to a character with a higher index. However, here, the range is not valid as _
's decimal code point is 95
, and /
's is 47
.
Try [\[\s,\/-_\]+][2]
and you will see it capturing stuff you would not like it to match.
That is why you should escape the hyphen inside the character class, or place it at its end of start. These are equal correct regexes: [\/\s,_-]+
, [-\/\s,_]+
and [\/\-\s,_]+
.
Upvotes: 1