Reputation: 87
I have the following lines:
12(3)/FO.2-3;1-2
153/G6S.3-H;2-3;1-2
1/G13S.2-3
22/FO.2-3;1-2
12(3)2S/FO.2-3;1-2
153/SH/G6S.3-H;2-3;1-2
45/3/H/GDP6;2-3;1-2
I digits to get a match if at the beginning of the line I find two or three numbers but not one, also if the field contains somewhere the expressions FO
, SH
, GDP
or LDP
I should not count it as an occurrence. It means, from the previous lines, only get 153/G6S.3-H;2-3;1-2
as a match because in the others either contain FO
, SH
, GDP
, or there is just one digit at the beginning.
I tried using
^[1-9][1-9]((?!FO|SH|GDP).)*$
I am getting the correct result but I am not sure is correct, I am not quite expert in regular expressions.
Upvotes: 3
Views: 365
Reputation: 51643
You need to add any other characters that might be between your starting digits and the things you want to exclude:
Simplified regex: ^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$
will only match 153/G6S.3-H;2-3;1-2
from your given data.
Explanation:
^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$
----------- 2 to 3 digits or more at start of line
^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$
--------------------- any characters + not matching (FO|SH|GDP|LDP)
^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$
--- match till end of line
The (?:....)
negative lookbehind must follow exactly, you have other characters between what you do not want to see and your match, hence it is not picking it up.
See https://regex101.com/r/j4SRoQ/1 for more explanations (uses {2,}
).
Full code example:
import re
regex = r"^[1-9]{2,3}(?!.*(?:FO|SH|GDP|LDP)).*$"
test_str = r"""12(3)/FO.2-3;1-2
153/G6S.3-H;2-3;1-2
1/G13S.2-3
22/FO.2-3;1-2
12(3)2S/FO.2-3;1-2
153/SH/G6S.3-H;2-3;1-2
45/3/H/GDP6;2-3;1-2"""
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
print(match.group())
Output:
153/G6S.3-H;2-3;1-2
Upvotes: 2