user1862963
user1862963

Reputation: 79

regular expressions to extract phone numbers

I am new to regular expressions and I am trying to write a pattern of phone numbers, in order to identify them and be able to extract them. My doubt can be summarized to the following simple example:

I try first to identify whether in the string is there something like (+34) which should be optional:

prefixsrch = re.compile(r'(\(?\+34\)?)?')

that I test in the following string in the following way:

line0 = "(+34)"
print prefixsrch.findall(line0)

which yields the result:

['(+34)',''] My first question is: why does it find two occurrences of the pattern? I guess that this is related to the fact that the prefix thing is optional but I do not completely understand it. Anyway, now for my big doubt

If we do a similar thing searching for a pattern of 9 digits we get the same:

numsrch = re.compile(r'\d{9}')
line1 = "971756754"
print numsrch.findall(line1)

yields something like:

['971756754']

which is fine. Now what I want to do is identify a 9 digits number, preceded or not, by (+34). So to my understanding I should do something like:

phonesrch = re.compile(r'(\(?\+34\)?)?\d{9}')

If I test it in the following strings...

line0 = "(+34)971756754"
line1 = "971756754"

print phonesrch.findall(line0)
print phonesrch.findall(line1)

this is, to my surprise, what I get:

['(+34)'] ['']

What I was expecting to get is ['(+34)971756754'] and ['971756754']. Does anybody has the insight of this? thank you very much in advance.

Upvotes: 1

Views: 1675

Answers (1)

Abhijit
Abhijit

Reputation: 63757

Your capturing group is wrong. Make the country code within a non-capturing group and the entire expression in the capturing group

>>> line0 = "(+34)971756754"
>>> line1 = "971756754"
>>> re.findall(r'((?:\(?\+34\)?)?\d{9})',line0)
['(+34)971756754']
>>> re.findall(r'((?:\(?\+34\)?)?\d{9})',line1)
['971756754']


 My first question is: why does it find two occurrences of the pattern?

This is because, ? which means it match 0 or 1 repetitions, so an empty string is also a valid match

Upvotes: 2

Related Questions