Matching phone numbers, regex

I've phone numbers in this format:

 some_text   phone_number some_text
 some_text   (888) 501-7526 some_text

Which is a more pythonic way way to search for the phone numbers

(\(\d\d\d\) \d\d\d-\d\d\d\d)

(\([0-9]+\) [0-9]+-[0-9]+)

or there is a simpler expresion to do this?

Upvotes: 4

Views: 3349

Answers (3)

Sede
Sede

Reputation: 61253

Using (\(\d{3}\)\s*\d{3}-\d{4})

>>> import re
>>> s = "some_text   (888) 501-7526 some_text"
>>> pat = re.compile(r'(\(\d{3}\)\s*\d{3}-\d{4})')
>>> pat.search(s).group() 
'(888) 501-7526'

Demo

Explanation:

  • (\(\d{3}\)\s*\d{3}-\d{4})/
    • 1st Capturing group (\(\d{3}\)\s*\d{3}-\d{4})
      • \( matches the character ( literally
      • \d{3} match a digit [0-9]
        • Quantifier: {3} Exactly 3 times
      • \) matches the character ) literally
      • \s* match any white space character [\r\n\t\f ]
        • Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
      • \d{3} match a digit [0-9] Quantifier: {3} Exactly 3 times
      • - matches the character - literally
      • \d{4} match a digit [0-9] Quantifier: {4} Exactly 4 times

Upvotes: 3

pzp
pzp

Reputation: 6607

I think you are looking for something like this:

(\(\d{3}\) \d{3}-\d{4})

From the Python docs:

{m}

Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{6} will match exactly six 'a' characters, but not five.

(\(\d\d\d\) \d\d\d-\d\d\d\d) would also work, but, as you said in your question, is rather repetitive. Your other suggested pattern, (\([0-9]+\) [0-9]+-[0-9]+), gives false positives on input such as (1) 2-3.

Upvotes: 6

linusg
linusg

Reputation: 6439

I think the second one would be the more pythonic way. The one above isn't that easy to read, but regular expressions aren't that intuitive at all.

(\([0-9]+\) [0-9]+-[0-9]+) will do it, if the lenght of the phone number is not specified. If the length is always the same, you can use (\([0-9]{3}\) [0-9]{3}-[0-9]{4}) or (\(\d{3}\) \d{3}-\d{4}).

Upvotes: 0

Related Questions