user160738
user160738

Reputation: 123

python regex - finding phone number

I've tried the following code:

import re

r = re.compile(r'''(\+)*\d*                 # optional + sign for international calls
                   ([" "-\)]{,1}\d+)*    # main chain of numbers, numbers separated by a space, ) or a hyphen
                   ''',re.VERBOSE)
print(r.findall('+00 0000 0000 is my number and +44-787-77950 was my uk number'))

The expected result

[('+00',' 0000',' 0000'),('+44','-787','-77950')]

Or, better:

['+00 0000 0000','+44-787-77950']

But the life isn't so easy, instead I get a cryptic

[('+', ' 0000'), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('+', '44'), ('', ''), ('', '787'), ('', ''), ('', '77950'), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', '')]

Why does it behave weirdly and how would I fix it?

Edit - my example was not the best one, the I wanted '+somenumber' to be optional - not all the phone numbers sent to me are international ones and thus does not have to include + sign

I'm sorry for not making this clear.

So far the closest thing to what I want seems to be

import re

r = re.compile(r'''(\+)?(\d+)                 # optional + sign for international calls
                   ([" "-\)]{,1}\d+)+    # main chain of numbers, numbers separated by a space, ) or a hyphen
                   ''',re.VERBOSE)
print(r.findall('+00 0000 0000 is my number and +44-787-77950 was my uk number'))

which gives

[('+', '00', ' 0000'), ('+', '4', '4'), ('', '78', '7'), ('', '7795', '0')]

Upvotes: 1

Views: 1195

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627087

A quick fix for you pattern is

\+?\d+(?:[- \)]+\d+)+

See the regex demo. Note that use of the non-capturing group that helps avoid creating lists of tuples in the result of the re.findall call.

Details

  • \+? - an optional (1 or 0) plus signs
  • \d+ - 1+ digits
  • (?: - start of a non-capturing group:
    • [- )]+ - 1 or more -, spaces,)` chars
    • \d+ - 1+ digits
  • )+ - 1 or more repetitions (the whole (?:...) sequence of patterns are quantified this way, both symbols and digits are required at least once and as a sequence).

Python demo:

import re
rx = r"\+?\d+(?:[- )]+\d+)+"
s = "+00 0000 0000 is my number and +44-787-77950 was my uk number"
print(re.findall(rx, s))
# => ['+00 0000 0000', '+44-787-77950']

Upvotes: 2

Chen A.
Chen A.

Reputation: 11318

You can use this regex to capture phone numbers, it will skip the spaces or dashes and keep just the number itself:

s = '+00 0000 0000 is my number and +44-787-77950 was my uk number'   
p = '(\+\d+)(?:[\s-])(\d+)(?:[\s-])(\d+)'

re.findall(p, s)
[('+00', '0000', '0000'), ('+44', '787', '77950')]

The pattern means:

  • (\+\d+) - look for + followed by one or more digits
  • (?:[\s-]) - followed by a space or a dash, in a non-capturing group (means it needs to be there, but don't return in the matched object)
  • (\d+) - another one or more digits

Then you can easily construct the phone numbers with space / dashes with join.

for number in re.findall(p, s):
    print '-'.join(number)

+00-0000-0000
+44-787-77950

Upvotes: 0

Aaron
Aaron

Reputation: 24802

All your tokens are optional, so the regex can and will match the empty string.

I think you will want to use the following :

(\+)?(\d+)([ \-)]?\d+)

Upvotes: 0

Related Questions