Torads
Torads

Reputation: 69

Preg_match php explaination

I have a question regarding one character in the preg_match syntax below. I just want to completely understand. \w looking for alpha-numberic characters and the underscore.

My question is what does the \ mean after \w and before the @ sign?

Does this mean that it will allow:

  1. any alphanumeric
  2. any backslash
  3. any dash

or is this backslash meant to single out the character that follows?

When I test it in w3schools.com example I can have backslashes in the email address which validates but they are removed when they are echoed out.

$email = test_input($_POST["email"]);
    // check if e-mail address syntax is valid
    if (!preg_match("/([\w\-]+\@[\w\-]+\.[\w\-]+)/",$email))
      {
      $emailErr = "Invalid email format"; 
      }

Upvotes: 1

Views: 92

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89639

The backslash is used to escape characters that have a special meaning in a regex to obtain a literal character. There are twelve characters that must be escaped: [ { ( ) . ? * + | \ ^ $

If I want to write a literal $ in a pattern, I must write \$

Note: you don't need to escape { if the situation is no ambiguous (with the quantifier {m,n} or {m})

Note 2: The delimiter of the pattern must be escaped too, inside and outside a character class.

Inside a character class these twelve characters don't need no more to be escaped since they loose their special meaning and are seen as literals. However, there is three characters that have a special meaning if they are in a special position in the character class. These characters are: ^ - ]

^ at the first position is used to negate a character class ([^M] => all that is not a M ). If you want to use it as a literal character at "the first position", you must write: [\^]

- between two characters defines a character range ([a-z]). This means that you don't need to escape it at the begining (or immediatly after ^) or at the end of the class. You only need to escape it between two characters. - is seen as a literal (and doesn't define a range) in all these examples:

[-abcd]
[^-abcd]
[abcd-]
[ab\-cd]
[\s-abcd]   # because \s is not a character

] since it is used to close the character class must be escaped except at the first position or immediatly after the ^. []] and [^]] are correct.

If I write the pattern without uneeded backslashes, I obtain:

/([\w-]+@[\w-]+\.[\w-]+)/

To answer your question ("What does it mean?"): Nothing, uneeded escapes are ignored by the regex engine.

Upvotes: 1

Related Questions