DaveWalker
DaveWalker

Reputation: 541

Python re.match - a successful match result for a digit is a letter?

Some really odd behaviour I can’t work out with what should be a simple regex in Python 3.7...

I have a string msg_data which contains event=mynode+button+0.

If I use the pattern r'^event=(?P<node>[\w-]{1,19})+(?P<interface>[\w-]{1,19})+ it works as expected - if I run params = re.match(pattern, msg_data) then params.group('node') = "mynode", and params.group('interface') = "button". All fine so far.

However, I can't match the 0 at the end...

If I add (?P<duration>[\d]+) to the end of my pattern, I get no matches and hence params = false. Same if I try [0-9] in the regex. And it won't match even if I put a literal 0 in the pattern.

However, if I add (?P<duration>[\w]+) to the end of my pattern, it matches - but gives params.group('duration') = "s"!!

Note that if the 0 is a 1, then it matches the letter r instead.

So the obvious question... what's going on? I've got loads of other regex patterns matching numbers fine. The msg_data string is coming from a http POST event, but is created as a 0 and prints as a 0 at all points elsewhere in the code.

Any thoughts on what could be causing this behaviour? It's been driving me crazy for two days - a really simple regex that just doesn't match what it should.

Thanks!

Upvotes: 0

Views: 81

Answers (1)

DaveWalker
DaveWalker

Reputation: 541

Thanks to @user2722968 for the answer - the basic error was not escaping the literal + in my pattern.

The impact was then slightly obscure, and I made a key mistake of not actually checking that params.group('node') and params.group('interface') were as expected. Actually they weren't. params.group('node') matched all except the last two characters in the first part of msg_data - then params.group('interface') matched the second-to-last, and params.group('duration') matched the last. The reason for the s vs r distinction is that I had a number of buttons on the webpage, and tried different ones on various occasions (which all had different variations of mynode. r and s corresponded to the last letter depending on which button I pressed - again, bad assumption on my part!

So great answer, thanks, but bad question on my part - too many assumptions of the "obvious", without checking. My bad, lesson learnt :(

Upvotes: 0

Related Questions