Nick Van Hoogenstyn
Nick Van Hoogenstyn

Reputation: 1317

Nesting [ ] with PHP RegExp

I'm trying to ensure that a string in PHP has only letters, hyphens or apostraphes. To accomplish this I wanted to make a range of valid characters using [ ]. So my idea was to do this:

[[A-Za-z]-'] // Weird syntax highlighting here

Will this work? Is it possible to nest brackets like that? This is meant to match a single character that is either a letter, a hyphen, or an apostraphe. I may be approaching the problem naively and that's OK, I just wanted to know if putting brackets within brackets like this is legal in PHP. Thanks!

Upvotes: 3

Views: 174

Answers (4)

ridgerunner
ridgerunner

Reputation: 34395

To ensure that a string contains only the desired characters you can do it two ways:

  • You know its good if all chars in the string are valid.
  • You know its bad if any one char in the string is invalid.

Here is a PHP snippet that demonstrates both methods:

// Method 1: Good if all chars in the string are valid.
$re_all_valid = '/^[A-Za-z\-\']*$/';
if (preg_match($re_all_valid, $text)) {
    echo("GOOD: String contains all valid characters.\n");
} else {
    echo("BAD: String does NOT contain all valid characters.\n");
}

// Method 2: Bad if any one char in the string is invalid.
$re_one_invalid = '/[^A-Za-z\-\']/';
if (preg_match($re_one_invalid, $text)) {
    echo("BAD: String contains one invalid character.\n");
} else {
    echo("GOOD: String does NOT contain one invalid character.\n");
}

Notes: Method 1 requires anchors at both ends of the string and a quantifier applied to the positive character class. Method 2 uses a negated character class and only needs to match one character in the string. Method 2 is likely more efficient.

Upvotes: 0

tchrist
tchrist

Reputation: 80415

[\pL\p{Pd}'ʹ’]⁠ ⁠ 

That matches:

  • any Letter character⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠ ⁠
  • any Dash Punctuation character
  • U+0027 APOSTROPHE (which is not the preferred form)
  • U+02B9 MODIFIER LETTER PRIME
  • U+2019 RIGHT SINGLE QUOTATION MARK

Upvotes: 0

David Z
David Z

Reputation: 131600

I'm assuming you're using this in one of the regular expression matching functions (like preg_match("[[A-Za-z]-']*", ...), and in that case, it's a question not of PHP syntax, but regular expression syntax. And the answer is no, you can't nest brackets like that. If you want a regular expression that matches only a letter, hyphen, or apostrophe, use [A-Za-z'-]. (The hyphen goes last so that the regex engine knows that it's not representing a range of characters like A-Z. Alternatively you can escape the hyphen with a backslash, then you can put it anywhere: [A-Za-z\-'].)

Upvotes: 3

Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385204

I don't understand.

What's wrong with

[A-Za-z'-]

?

Upvotes: 1

Related Questions