lemon
lemon

Reputation: 747

Regex: match one pattern and exclude another pattern

I have a regular expression that matches the phone numbers:

import re
phones = re.findall(r'[+(]?[0-9][0-9 \-()]{8,}[0-9]', text)

It shows good accuracy in a large raw text dataset.

But sometimes it matches unwanted results (ranges of years and random IDs).

Ranges of years:

'2012 - 2017'
'(2011 - 2013'
'1999                                                   02224'
'2019     2010-2015'
'2018-2018 (5'
'2004 -2009'
'1) 2005-2006'
'2011            2020'

Random ids:

'5                    5                    5                 5'
'100032479008252'
'100006711277302'

I have ideas on how to solve these problems.

  1. Limit the total number of digits to 12 digits.
  2. Limit the total number of characters to 16 characters.
  3. Remove the ranges of years (19**|20** - 19**|20**).

But I do not know how to implement these ideas and make them as exceptions in my regular expression.

Some examples that a regular expression should catch are presented below:

380-956-425979
+38(097)877-43-88
+38(050) 284-24-20
(097) 261-60-52
380-956-425979
(068)1850063
0975533222

Upvotes: 3

Views: 973

Answers (2)

Frank
Frank

Reputation: 1285

I suggest you write different patterns for different phone strucutres. I'm not so sure about your phone number structures, but this matches your example:

import re
test = '''380-956-425979
+38(097)877-43-88
+38(050) 284-24-20
(097) 261-60-52
380-956-425979
(068)1850063
0975533222'''
solution = test.split("\n")

p1 = "\+?\d{3}\-\d{3}\-\d{6}"
p2 = "\+?(?:\d{2})?\(\d{3}\) ?\d{3}\-\d{2}\-\d{2}"
p3 = "\+?\d{3}\-\d{3}\-\d{6}"
p4 = "\+?(?:\(\d{3}\)|\d{3})\d{7}"

result = re.findall(f'{p1}|{p2}|{p3}|{p4}', test)
print(solution)
print(result)

Upvotes: 1

Niwla23
Niwla23

Reputation: 113

You could do it in python directly:

if regex.match("condition", "teststring") and not regex.match("not-condition", "teststring"):
   print("Match!")

Upvotes: 0

Related Questions