Reputation: 315
Hi with python I want to catch phone numbers in text but want to exclude the ones which are following the words Fax or fax.
I use the following regex which works if the sentence begin with Fax or fax but it doesn't work if fax is inside the sentence :
^(?!fax|Fax)(?:.*?)(?![-a-z])((?:[^0-9])((\+|00)33\s?|0|\(0\))[123456789][ \.\-]?[0-9]{2}[ \.\-]?[0-9]{2}[ \.\-]?[0-9]{2}[ \.\-]?[0-9]{2})(?![0-9])
Here a example of text I analyse :
text
Adresse quai du Sa fax 06 32 32 32 33 rtel – 59100 ROUBAIX| FRANCE
faTel : 0 8 99 70 1761 – Fax : 06 32 32 32 34
Mail :[email protected]
06 32 32 32 35
Fax 06 32 32 32 36
tel 06 32 32 32 37 henrg
the result of my regex is :
Match 1
Full match 5-42 `Adresse quai du Sa fax 06 32 32 32 33`
Group 1. 27-42 ` 06 32 32 32 33`
Group 2. 28-29 `0`
Match 2
Full match 72-117 `faTel : 0 8 99 70 1761 – Fax : 06 32 32 32 34`
Group 1. 102-117 ` 06 32 32 32 34`
Group 2. 103-104 `0`
Match 3
Full match 118-157 `Mail :[email protected]
06 32 32 32 35`
Group 1. 142-157 `
06 32 32 32 35`
Group 2. 143-144 `0`
Match 4
Full match 178-196 `tel 06 32 32 32 37`
Group 1. 181-196 ` 06 32 32 32 37`
Group 2. 182-183 `0`
But I don't want "06 32 32 32 34" and "06 32 32 32 33" in the result because "fax" is before...
Thanks
Upvotes: 2
Views: 332
Reputation: 14680
You're using lookahead instead of lookbehind (?<!..)
With this regex I seem to get all the phone numbers and no fax numbers:
(?<!Fax |fax )((\d\d\s){5}|((\d\s){2}(\d\d\s){2}\d{4}))
Upvotes: 0
Reputation: 626826
I suggest using a regex that will match what you do not need but will match and capture what you need:
(?i)fax\W*\d[\s\d]*|(\d[\s\d]*\d)
See the regex demo. Green highlighted items are what you need to grab. Note: the numbers you will get in Group 1 should contain at least 2 digits. Also, you may precise the patterns as per further requirements, just use the same "framework", as I tried to simplify the regex structure to show the main concept.
Details
(?i)
- case insensitive modifierfax
- the fax
substring\W*
- any 0+ non-word chars (you may precise it to only work with spaces and colons, like \s*(?::\s*)?
)\d
- a digit[\s\d]*
- 0+ whitespaces or digits|
- or...(\d[\s\d]*\d)
- Group 1 (the value you need)
\d
- a digit[\s\d]*
- 0+ whitespaces or digits\d
- a digitIn Python 2, use
import re
rx = r"(?i)fax\W*\d[\s\d]*|(\d[ \d]*\d)"
s ="text\nAdresse quai du Sa fax 06 32 32 32 33 rtel – 59100 ROUBAIX| FRANCE\nfaTel : 0 8 99 70 1761 – Fax : 06 32 32 32 34\nMail :[email protected]\n06 32 32 32 35\n\nFax 06 32 32 32 36\ntel 06 32 32 32 37 henrg"
res = filter(None, re.findall(rx, s))
print(res)
# => ['59100', '0 8 99 70 1761', '06 32 32 32 35', '06 32 32 32 37']
See the Python 2 demo
Upvotes: 2