Reputation: 85
I'm trying to find any occurunce of "fiction" preceeded or followed by anything, except for "non-"
I tried :
.*[^(n-)]fiction.*
but it's not working as I want it to. Can anyone help me out?
Upvotes: 1
Views: 593
Reputation: 626689
You should avoid patterns starting with .*
: they cause too many backtracking steps and slow down the code execution.
In Python, you may always get lines either by reading a file line by line, or by splitting a line with splitlines()
and then get the necessary lines by testing them against a pattern without .*
s.
final_output = []
with open(filepath, 'r', newline="\n", encoding="utf8") as f:
for line in f:
if "fiction" in line and "non-fiction" not in line:
final_output.append(line.strip())
Or, getting the lines even with non-fiction
if there is fiction
with no non-
in front using a bit modified @jlesuffleur's regex:
import re
final_output = []
rx = re.compile(r'\b(?<!non-)fiction\b')
with open(filepath, 'r', newline="\n", encoding="utf8") as f:
for line in f:
if rx.search(line):
final_output.append(line.strip())
import re
text = "Your input string line 1\nLine 2 with fiction\nLine 3 with non-fiction\nLine 4 with fiction and non-fiction"
rx = re.compile(r'\b(?<!non-)fiction\b')
# Approach with regex returning any line containing fiction with no non- prefix:
final_output = [line.strip() for line in text.splitlines() if rx.search(line)]
# => ['Line 2 with fiction']
# Non-regex approach that does not return lines that may contain non-fiction (if they contain fiction with no non- prefix):
final_output = [line.strip() for line in text.splitlines() if "fiction" in line and "non-fiction" not in line]
# => ['Line 2 with fiction', 'Line 4 with fiction and non-fiction']
See a Python demo.
Upvotes: 2
Reputation: 1253
What about a negative lookbehind?
s = 'fiction non-fiction'
res = re.findall("(?<!non-)fiction", s)
res
Upvotes: 1