Reputation: 3781
I have to extract phone numbers from free form of texts.
How can I manage it by using reg-ex in python?
I have found for one in order to extract e-mail addresses. https://gist.github.com/dideler/5219706
I have implemented the same approach by using a phone number regex instead of e-mail address regex, but I couldn't get output.
def get_phoneNumber(text):
phone_number = ""
regex = re.compile("((\(\d{3,4}\)|\d{3,4}-)\d{4,9}(-\d{1,5}|\d{0}))|(\d{4,12})")
for phoneNumber in get_phoneNumbers(text, regex):
phone_number = phone_number + phoneNumber + "\n"
return phone_Number
def get_phoneNumbers(s, regex):
return (phoneNumber[0] for phoneNumber in re.findall(regex, s)
How can I manage to do it?
Upvotes: 2
Views: 12289
Reputation: 1677
This should find all the phone numbers in a given string including international numbers. Taking the example by @buckley, Lets use the string
text="""Matches 3334445555, 333.444.5555, 333-444-5555, 333 444 5555, (333) 444 5555 and all combinations thereof, like 333 4445555, (333)4445555 or 333444-5555. Does not match international notation +13334445555, but matches domestic part in +1 333 4445555."""
re.findall(r'+?(?[1-9][0-9 .-()]{8,}[0-9]', text)
>>> re.findall(r'[\+\(]?[1-9][0-9 .\-\(\)]{8,}[0-9]', text)
['3334445555', '333.444.5555', '333-444-5555', '333 444 5555',
'(333) 444 5555', '333 4445555', '(333)4445555', '333444-5555',
'+13334445555', '+1 333 4445555']
Basically, the regex lays out these rules
Upvotes: 3
Reputation: 65
So I think I got a hang of your problem.
This is what I would do in order:
Upvotes: 0
Reputation: 14099
This regex matches typical phone numbers from North America
Matches 3334445555, 333.444.5555, 333-444-5555, 333 444 5555, (333) 444 5555 and all combinations thereof, like 333 4445555, (333)4445555 or 333444-5555. Does not match international notation +13334445555, but matches domestic part in +1 333 4445555.
\(?\b[2-9][0-9]{2}\)?[-. ]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b
Source: RegexBuddy
The following Python code iterates over all matches
for match in re.finditer(r"\(?\b[2-9][0-9]{2}\)?[-. ]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b", subject):
# match start: match.start()
# match end (exclusive): match.end()
# matched text: match.group()
What patterns are you expecting?
Upvotes: 6
Reputation: 65
You have to build a pattern to be able to match it with regexp. The question is what is the format you are looking for?
To be able to do this you should do some research on the use-cases how the phone numbers show up.
So I'd expect you to define what are you meaning by matching phone numbers.
I just mean that there is a huge difference between: - I want to match phone numbers from a text that can be from any country, mobile or landline, in any format, with random spaces and (,) chars in it or - I want to match phone numbers from Hungary, with a +xx(space)xxxxxxx(space) format, that is always consistent.
Summary: To be able to build a pattern with regexp and use it to match all the phone numbers in your text, you have to be aware of the different representations, meaning what are you expecting a phone number will look like. If your pattern is not correct, you might miss a lot of phone numbers.
Hope this code serves a good cause, V
Upvotes: 2