sdasdadas
sdasdadas

Reputation: 25106

Why does my regular expression in Python not capture integers?

I am using a regular expression to find sequences of numbers that aren't refixed with 0x.

eg.

0
50505
20201
0012

My regular expression is (?:[^(0x)]\d+), which (in my head) translates to match any sequence of digits not started with 0x. But this doesn't work - what assumption am I incorrectly making?

Upvotes: 2

Views: 79

Answers (2)

falsetru
falsetru

Reputation: 369134

[^(0x)] in your regular expression match any character that is not (, 0, x, ).

Use negative lookbehind:

>>> re.findall(r'(?<!0x)\d+\b', '0 0x111 50505 20201 0012')
['0', '11', '50505', '20201', '0012']

From http://docs.python.org/2/library/re.html

(?<!...)

Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.


UPDATE

Use following regular expression:

>>> re.findall(r'\b\d+\b', '0 0x111 50505 20201 0012')
['0', '50505', '20201', '0012']

Upvotes: 5

erewok
erewok

Reputation: 7835

That looks like a non-capture group for anything that doesn't start with [0 or x].

Did you try a negative look-behind?

(r'(?<!0x)\d+')

(Took me a helluva long time to type this on the phone and now I see someone's already sorted it. Lesson learned: no regex on the phone.)

Upvotes: 1

Related Questions