yusuf
yusuf

Reputation: 3781

Extracting phone numbers from a free form text in python by using regex

I have to extract phone numbers from free form of texts.

How can I manage it by using reg-ex in python?

I have found for one in order to extract e-mail addresses. https://gist.github.com/dideler/5219706

I have implemented the same approach by using a phone number regex instead of e-mail address regex, but I couldn't get output.

def get_phoneNumber(text):
        phone_number = ""
        regex = re.compile("((\(\d{3,4}\)|\d{3,4}-)\d{4,9}(-\d{1,5}|\d{0}))|(\d{4,12})")

        for phoneNumber in get_phoneNumbers(text, regex):
                phone_number = phone_number + phoneNumber + "\n"

        return phone_Number

def get_phoneNumbers(s, regex):
        return (phoneNumber[0] for phoneNumber in re.findall(regex, s)

How can I manage to do it?

Upvotes: 2

Views: 12289

Answers (4)

Sharmila
Sharmila

Reputation: 1677

This should find all the phone numbers in a given string including international numbers. Taking the example by @buckley, Lets use the string

text="""Matches 3334445555, 333.444.5555, 333-444-5555, 333 444 5555, (333) 444 5555 and all combinations thereof, like 333 4445555, (333)4445555 or 333444-5555. Does not match international notation +13334445555, but matches domestic part in +1 333 4445555."""

re.findall(r'+?(?[1-9][0-9 .-()]{8,}[0-9]', text)

 >>> re.findall(r'[\+\(]?[1-9][0-9 .\-\(\)]{8,}[0-9]', text)
['3334445555', '333.444.5555', '333-444-5555', '333 444 5555', 
 '(333) 444 5555', '333 4445555', '(333)4445555', '333444-5555', 
 '+13334445555', '+1 333 4445555']

Basically, the regex lays out these rules

  1. The matched string may start with + or ( symbol
  2. It has to be followed by a number between 1-9
  3. It has to end with a number between 0-9
  4. It may contain 0-9 (space) .-() in the middle.

Upvotes: 3

vilk
vilk

Reputation: 65

So I think I got a hang of your problem.

This is what I would do in order:

  • Learn what reg-ex is, without the foundational knowledge you are just wasting our and your own time.
  • Watch this: https://www.youtube.com/watch?v=ZdDOauFIDkw
  • Write down what you don't know
  • Research
  • Write code, provide sample input for your code, copy it to http://pastebin.com, and show it to us, if it's still not working.
  • repeat.

Upvotes: 0

buckley
buckley

Reputation: 14099

This regex matches typical phone numbers from North America

Matches 3334445555, 333.444.5555, 333-444-5555, 333 444 5555, (333) 444 5555 and all combinations thereof, like 333 4445555, (333)4445555 or 333444-5555. Does not match international notation +13334445555, but matches domestic part in +1 333 4445555.

\(?\b[2-9][0-9]{2}\)?[-. ]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b

Source: RegexBuddy

The following Python code iterates over all matches

for match in re.finditer(r"\(?\b[2-9][0-9]{2}\)?[-. ]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b", subject):
    # match start: match.start()
    # match end (exclusive): match.end()
    # matched text: match.group()

What patterns are you expecting?

Upvotes: 6

vilk
vilk

Reputation: 65

You have to build a pattern to be able to match it with regexp. The question is what is the format you are looking for?

To be able to do this you should do some research on the use-cases how the phone numbers show up.

So I'd expect you to define what are you meaning by matching phone numbers.

  • Is it a specific format that you looking for, always consistent through the free text?
  • Or can you define the string with a pattern that matches a phone number, by the country code (+xx) and then an specific number of digits?

I just mean that there is a huge difference between: - I want to match phone numbers from a text that can be from any country, mobile or landline, in any format, with random spaces and (,) chars in it or - I want to match phone numbers from Hungary, with a +xx(space)xxxxxxx(space) format, that is always consistent.

Summary: To be able to build a pattern with regexp and use it to match all the phone numbers in your text, you have to be aware of the different representations, meaning what are you expecting a phone number will look like. If your pattern is not correct, you might miss a lot of phone numbers.

Hope this code serves a good cause, V

Upvotes: 2

Related Questions