Reputation: 123
I've tried the following code:
import re
r = re.compile(r'''(\+)*\d* # optional + sign for international calls
([" "-\)]{,1}\d+)* # main chain of numbers, numbers separated by a space, ) or a hyphen
''',re.VERBOSE)
print(r.findall('+00 0000 0000 is my number and +44-787-77950 was my uk number'))
The expected result
[('+00',' 0000',' 0000'),('+44','-787','-77950')]
Or, better:
['+00 0000 0000','+44-787-77950']
But the life isn't so easy, instead I get a cryptic
[('+', ' 0000'), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('+', '44'), ('', ''), ('', '787'), ('', ''), ('', '77950'), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', '')]
Why does it behave weirdly and how would I fix it?
Edit - my example was not the best one, the I wanted '+somenumber' to be optional - not all the phone numbers sent to me are international ones and thus does not have to include + sign
I'm sorry for not making this clear.
So far the closest thing to what I want seems to be
import re
r = re.compile(r'''(\+)?(\d+) # optional + sign for international calls
([" "-\)]{,1}\d+)+ # main chain of numbers, numbers separated by a space, ) or a hyphen
''',re.VERBOSE)
print(r.findall('+00 0000 0000 is my number and +44-787-77950 was my uk number'))
which gives
[('+', '00', ' 0000'), ('+', '4', '4'), ('', '78', '7'), ('', '7795', '0')]
Upvotes: 1
Views: 1195
Reputation: 627087
A quick fix for you pattern is
\+?\d+(?:[- \)]+\d+)+
See the regex demo. Note that use of the non-capturing group that helps avoid creating lists of tuples in the result of the re.findall
call.
Details
\+?
- an optional (1 or 0) plus signs\d+
- 1+ digits(?:
- start of a non-capturing group:
[- )]+
- 1 or more -
, spaces,
)` chars\d+
- 1+ digits )+
- 1 or more repetitions (the whole (?:...)
sequence of patterns are quantified this way, both symbols and digits are required at least once and as a sequence). import re
rx = r"\+?\d+(?:[- )]+\d+)+"
s = "+00 0000 0000 is my number and +44-787-77950 was my uk number"
print(re.findall(rx, s))
# => ['+00 0000 0000', '+44-787-77950']
Upvotes: 2
Reputation: 11318
You can use this regex to capture phone numbers, it will skip the spaces or dashes and keep just the number itself:
s = '+00 0000 0000 is my number and +44-787-77950 was my uk number'
p = '(\+\d+)(?:[\s-])(\d+)(?:[\s-])(\d+)'
re.findall(p, s)
[('+00', '0000', '0000'), ('+44', '787', '77950')]
The pattern means:
(\+\d+)
- look for + followed by one or more digits(?:[\s-])
- followed by a space or a dash, in a non-capturing group (means it
needs to be there, but don't return in the matched object)(\d+)
- another one or more digitsThen you can easily construct the phone numbers with space / dashes with join
.
for number in re.findall(p, s):
print '-'.join(number)
+00-0000-0000
+44-787-77950
Upvotes: 0
Reputation: 24802
All your tokens are optional, so the regex can and will match the empty string.
I think you will want to use the following :
(\+)?(\d+)([ \-)]?\d+)
Upvotes: 0