Match an occurrence starting with two or three digits but not containing a specific pattern somewhere

Question

I have the following lines:

 12(3)/FO.2-3;1-2
 153/G6S.3-H;2-3;1-2
 1/G13S.2-3
 22/FO.2-3;1-2
 12(3)2S/FO.2-3;1-2
 153/SH/G6S.3-H;2-3;1-2
 45/3/H/GDP6;2-3;1-2

I digits to get a match if at the beginning of the line I find two or three numbers but not one, also if the field contains somewhere the expressions FO, SH, GDP or LDP I should not count it as an occurrence. It means, from the previous lines, only get 153/G6S.3-H;2-3;1-2 as a match because in the others either contain FO, SH, GDP, or there is just one digit at the beginning.

I tried using

^[1-9][1-9]((?!FO|SH|GDP).)*$

I am getting the correct result but I am not sure is correct, I am not quite expert in regular expressions.

Patrick Artner · Accepted Answer

You need to add any other characters that might be between your starting digits and the things you want to exclude:

Simplified regex: ^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$

will only match 153/G6S.3-H;2-3;1-2 from your given data.

Explanation:

^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$
-----------  2 to 3 digits or more at start of line  

^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$
            --------------------- any characters + not matching (FO|SH|GDP|LDP)  

^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$
                                  ---  match till end of line

The (?:....) negative lookbehind must follow exactly, you have other characters between what you do not want to see and your match, hence it is not picking it up.

See https://regex101.com/r/j4SRoQ/1 for more explanations (uses {2,}).

Full code example:

import re

regex = r"^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$"

test_str = r"""12(3)/FO.2-3;1-2
153/G6S.3-H;2-3;1-2
1/G13S.2-3
22/FO.2-3;1-2
12(3)2S/FO.2-3;1-2
153/SH/G6S.3-H;2-3;1-2
45/3/H/GDP6;2-3;1-2"""

matches = re.finditer(regex, test_str, re.MULTILINE)

for match in matches: 
    print(match.group())

Output:

153/G6S.3-H;2-3;1-2

Match an occurrence starting with two or three digits but not containing a specific pattern somewhere

Answers (1)

Related Questions