Steve Armstrong
Steve Armstrong

Reputation: 5392

Python regex matches string it shouldn't

I'm totally lost at how this regex matches this string in python. Could someone make sense of it please?

import re
regex = "^PHP/5.\\{3|2\\}.\\{1|2|3|4|5|6|7|8|9|0\\}\\{1|2|3|4|5|6|7|8|9|0\\}$"
ua = 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'
re.compile(regex).search(ua)

The regex starts with PHP, while the string does not. Shouldn't that simply disqualify a match from happening?

Upvotes: 0

Views: 167

Answers (2)

Sam
Sam

Reputation: 20486

You need grouping (preferably non-capturing) for your alternation:

PHP/5.\\{(?:3|2)\\}.\\{(?:1|2|3|4|5|6|7|8|9|0)\\}\\{(?:1|2|3|4|5|6|7|8|9|0)\\}$
         ^^    ^       ^^                    ^      ^^                    ^

Other wise you will be alternating the entire expression:

  • PHP/5.\\{3 or
  • 2 or
  • \\}.\\{1 or
  • 2 or
  • 3 or
  • 4 or
  • 5 match found!

Think PEMDAS and nested conditionals (if(a && (b || c)) { }).

Upvotes: 5

thefourtheye
thefourtheye

Reputation: 239443

Your RegEx fails, because | plays an important role here. So, your string is matched for items, like this

  • ^PHP/5.\\{3

  • 2\\}.\\{

and so on. Since the or matches 5 in 4|5|6, it actually matches the 5 in Mozilla/5.0.

You can see online demo and explanation for the same, here.

Regular expression visualization

Debuggex Demo

Upvotes: 4

Related Questions