Reputation: 4792
I have a regex: r'((\+91|0)?\s?\d{10})'
I'm trying to match numbers like +91 1234567890
, 1234567790
, 01234567890
.
These numbers shouldn't be matched: 1234568901112
because it doesn't start with +91 or 0 or doesn't have just 10 numbers:
When I try to use re.findall()
:
re.findall(r'((\+91|0)?\s?\d{10})', '+91 1234567890, 1234567790, 01234567890, 1234568901112')
[('+91 1234567890', '+91'),
(' 1234567790', ''),
(' 0123456789', ''),
(' 1234568901', '')]
You can notice that in the third and fourth index the output is not what I want. My expected output at third index is 01234568890 and because it starts with 0 and followed by 10 characters. But it's only showing the first 10 characters. Also I don't want the output in the 4th index because it the number doesn't completely match. So either it matched the complete word/string else it is invalid.
Is there any other regex that I can use? Or a function? What am I doing wrong here?
The expected output is:
[('+91 1234567890','1234567790', '01234567890']
Please let me know if any more clarifications are needed.
Upvotes: 1
Views: 147
Reputation: 627488
You may use
r'(?<!\w)(?:(?:\+91|0)\s?)?\d{10}\b'
See the regex demo.
The point is to match these patterns as whole words, the problem is that the first part is optional and one of the optional alteratives starts with a non-word char, so a single \b
word boundary won't work here.
Details
(?<!\w)
- there should be no word char immediately to the left of the current location(?:(?:\+91|0)\s?)?
- an optional occurrence of
(?:\+91|0)
- +91
or 0
\s?
- an optional whitespace\d{10}\b
- ten digits matches as a whole word, no word chars allowed on both sidesimport re
s = '+91 1234567890, 1234567790, 012345678900, 1234568901112, 01234567890'
print(re.findall(r'(?<!\w)(?:(?:\+91|0)\s?)?\d{10}\b', s))
# => ['+91 1234567890', '1234567790', '01234567890']
Upvotes: 2