gremo
gremo

Reputation: 48909

I think that this regular expression should not fail, what I'm missing?

^(?![_\.\'\-])(?:[\p{L} ]+)$

If I understand correctly, there is:

First question is: something like "1Bob" should not fail (because of the lookahead). So why it fails?

Second question is where I can find a list or explanation of characters in Ll, Lm, Lo, Lt and Lu?

Upvotes: 1

Views: 66

Answers (2)

Bart Kiers
Bart Kiers

Reputation: 170158

The digit "1" is not matched by \p{L} (this matches only letters!). If you want to match any (numeric) digit, use the class \p{N} as well:

$text = "1Bob";

if (preg_match("/^(?![_\.\'\-])(?:[\p{N}\p{L} ]+)$/u", $text)) {
  echo "Matched!\n";
} else {
  echo "No match...\n";
}

which will print:

Matched!

Also, there are small differences between Ruby's regex engine and that of PHP. Since your target language seems to be PHP, I recommend testing it with PHP, not with Rubular (Ruby).

Note that inside character classes, the "normal" regex meta chars don't have any special powers and need not be escaped: preg_match("/^(?![_.'-])(?:[\p{N}\p{L} ]+)$/u", $text)

An overview of many Unicode Character Properties/Classes can be found here: http://www.regular-expressions.info/unicode.html

Upvotes: 3

protist
protist

Reputation: 1220

(?![_\.\'\-])

is the same as

(?![_.'-])

Most metacharacters within bracketed character classes do not require escaping. The dash would require escaping if it were part of an intelligible range. Being as the dash is at the end of the bracketed character class, it does not require escaping either.

Upvotes: 1

Related Questions