Reputation: 317
I'm playing with Regex's in python. I know there is a ton of documentation on this. But I just can't understand this apparently simple example:
On this code:
import re
phoneNumRegex = re.compile(r'(\d\d\d)*')
mo = phoneNumRegex.search('My number is 415-555-4242. 423-531-5412')
print(mo.group())
I'm expecting to get the output:
415, 555, 423, 531
However the program only returns an empty string(nothing). My logic was to specify that I want a group of 3 digits and then the *
specifies to match this kind of group 0 or 'n' times. Since I have multiple 3 digit groups in my string I was expecting to get all of them printed. What am I doing wrong?I tried with the +
as well instead of *
which by my understanding is supposed to find the group at least once. If I do that it only prints the first group and not all as I would expect. How should I write this to get all 3 digit groups printed?
Upvotes: 3
Views: 1498
Reputation: 626794
You have defined a repeated capturing group. The (\d\d\d)*
pattern matches and captures into a capturing group with ID 1 any 3 digits, zero or more times (due to the *
quantifier), that is, if there is no digit at a certain location inside the string, an empty string will be captured, and if there are 6 consecutive digits, it will match them all, but the capturing group memory buffer will contain the last 3. See your pattern demo with multiple matching enabled.
However, in your code, you are using re.search
, a method that only returns a single (the first) match. Since the engine tries to match a string from left to right, it checks the starting position and finds M
. It is not a digit, so the pattern matches an empty string before M
(due to *
quantifier).
So, if you use re.findall
, you will get many empty strings inside the resulting list using the pattern.
As a quick fix you would use +
quantifier, 1 or more repetitions, but it would still return 3 digit chunks located at the end of each digit chunks.
The solution is to use a multiple matching method, like re.findall
or re.finditer
without an enclosing quantified grouping construct, r'\d{3}'
, or in case you need to match a 3-digit number not enclosed with other digits, r'(?<!\d)\d{3}(?!\d)'
or r'\b\d{3}\b'
to match the 3-digit chunks as a whole word. See a sample regex demo.
Upvotes: 2
Reputation: 82765
Use re.findall
Ex:
import re
phoneNumRegex = re.compile(r'(\b\d{3}\b)')
mo = phoneNumRegex.findall('My number is 415-555-4242. 423-531-5412')
print(mo)
Output:
['415', '555', '423', '531']
Upvotes: 2