ratatosk
ratatosk

Reputation: 393

Why does this regex match anything

The regular expression

\\(?:[A-Za-z@]+|.)

is used for latex syntax highlighting in Texworks.

Why does this expression matches anything besides \? As I understand, the lookahead is not matching anything (only checking if the condition is true). This expression is used to match Latex commands which usually are \command but also have special characters \%,\|,... hence the .in the regex.

Can somebody explain why?

Upvotes: 0

Views: 135

Answers (3)

acarlon
acarlon

Reputation: 17272

?: is a non-capturing group. ?= is a lookahead. The reason for the (?:) is so that the | applies to [A-Za-z@] and .. Without the (?:) the | would apply to \\[A-Za-z@] and .. This would match any string (because the . would always match), not just strings that have a \ (more details on the matches follows). Since this is just a logical grouping to specify the scope of the | there is no need to keep the capture group which is why ?: is used.

Looking at the regex:

\\ means start matching with a \. Now, the string doesn't need to start with a \ to match. The regex will match \abc, but it will also match the string a\abc where the match result will be \abc in both cases.

[A-Za-z@]+ - + means one or more. So it matches one or more of the characters inside []. This means that string such as \a, \abc, \a@b will be matched

| means OR.

. is any single character (not newline by default). This means that string such as \a, \#, \, will be matched. So the first character after the \ can be any character, but if it does not match [A-Za-z@]+ then the match will only match the first character. For example \#a will only match \#.

Examples of matches, where the match result is highlighted:

  • \abc@
  • ab\abc@
  • ab\abc@#a
  • \#abc@
  • ab\#abc@#a

Examples of strings that will not match (note that these strings would have matched if the (?:) was removed)

  • \
  • abcabc

Upvotes: 2

l'L'l
l'L'l

Reputation: 47219

The pattern is only going to match \ and any character(s) directly following it.

  1. The pattern matches the character \ literally
  2. Next you have a non-capturing group (?:[A-Za-z@]+|.)
  3. You also have another group which . matches any character (except newline).

There's no lookahead ahead, here's a list of expressions for reference.

enter image description here

Upvotes: 1

Deepu
Deepu

Reputation: 7610

In the given regex,

\\(?:[A-Za-z@]+|.)

() is a group operator. Regex treats the entries inside the group operator as a single unit.

So the regular expression accepts strings like,

\., \|, \a, etc..

Moreover the regular expression will not accept \ as such.

Upvotes: 2

Related Questions