Reputation: 747
I have a regular expression that matches the phone numbers:
import re
phones = re.findall(r'[+(]?[0-9][0-9 \-()]{8,}[0-9]', text)
It shows good accuracy in a large raw text dataset.
But sometimes it matches unwanted results (ranges of years and random IDs).
Ranges of years:
'2012 - 2017'
'(2011 - 2013'
'1999 02224'
'2019 2010-2015'
'2018-2018 (5'
'2004 -2009'
'1) 2005-2006'
'2011 2020'
Random ids:
'5 5 5 5'
'100032479008252'
'100006711277302'
I have ideas on how to solve these problems.
19**|20** - 19**|20**
).But I do not know how to implement these ideas and make them as exceptions in my regular expression.
Some examples that a regular expression should catch are presented below:
380-956-425979
+38(097)877-43-88
+38(050) 284-24-20
(097) 261-60-52
380-956-425979
(068)1850063
0975533222
Upvotes: 3
Views: 973
Reputation: 1285
I suggest you write different patterns for different phone strucutres. I'm not so sure about your phone number structures, but this matches your example:
import re
test = '''380-956-425979
+38(097)877-43-88
+38(050) 284-24-20
(097) 261-60-52
380-956-425979
(068)1850063
0975533222'''
solution = test.split("\n")
p1 = "\+?\d{3}\-\d{3}\-\d{6}"
p2 = "\+?(?:\d{2})?\(\d{3}\) ?\d{3}\-\d{2}\-\d{2}"
p3 = "\+?\d{3}\-\d{3}\-\d{6}"
p4 = "\+?(?:\(\d{3}\)|\d{3})\d{7}"
result = re.findall(f'{p1}|{p2}|{p3}|{p4}', test)
print(solution)
print(result)
Upvotes: 1
Reputation: 113
You could do it in python directly:
if regex.match("condition", "teststring") and not regex.match("not-condition", "teststring"):
print("Match!")
Upvotes: 0