jimchristie
jimchristie

Reputation: 215

Meaning of a dash between mixed characters in regex?

I'm just getting my feet wet with regexes and I came across this within a PHP program that someone else had written: [ -\w]. Note that the dash is not the first character, there is a space preceding it.

I can't make heads or tails of what it means. I know that the dash between characters inside brackets normally indicates a range, i.e. [a-z] matches any lowercase character "a" through "z", but what does it match when the dash is between characters of different types?

My first thought was that it just matches any space or alphanumeric character, but then the dash wouldn't be necessary. My second thought was that it's matching spaces, alphanumerics, and the dash; but then I realized that the dash would probably be either escaped or moved to the front or back for that.

I've googled around and can't find anything about using a dash in a character class with mixed characters. Maybe I'm using the wrong search terms.

Upvotes: 0

Views: 94

Answers (3)

Toto
Toto

Reputation: 91498

In the PCRE reference §16. we find:

  1. Perl, when in warning mode, gives warnings for character classes such as [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter- als. PCRE has no warning features, so it gives an error in these cases because they are almost certainly user mistakes.

[ -\w] produces a warning in perl but not in php.

Upvotes: 1

anubhava
anubhava

Reputation: 785631

Your regex [ -\w] seems to be a misplaced one as it will only match characters like this:

[ !"#$%&'()*+,./-]

As due to - appearing in the middle it will act as a range between space (32) and first \w (48) characters.

Upvotes: 0

Owen
Owen

Reputation: 209

This might help : http://www.regular-expressions.info/charclass.html in the section "Metacharacters Inside Character Classes" it says :

Hyphens at other positions in character classes where they can't form a range may be interpreted as literals or as errors. Regex flavors are quite inconsistent about this.

My guess would be that it is being intepreted as a literal, so the regexp would match a space, hyphen or \w .

As a reference, it looks invalid in PCRE: Debuggex Demo

Upvotes: 2

Related Questions