Atheer
Atheer

Reputation: 85

Regex to exclude a specific pattern python

I'm trying to find any occurunce of "fiction" preceeded or followed by anything, except for "non-"

I tried :

.*[^(n-)]fiction.*

but it's not working as I want it to. Can anyone help me out?

example

Upvotes: 1

Views: 593

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

You should avoid patterns starting with .*: they cause too many backtracking steps and slow down the code execution.

In Python, you may always get lines either by reading a file line by line, or by splitting a line with splitlines() and then get the necessary lines by testing them against a pattern without .*s.

  1. Reading a file line by line:
final_output = []
with open(filepath, 'r', newline="\n", encoding="utf8") as f:
  for line in f:
    if "fiction" in line and "non-fiction" not in line:
      final_output.append(line.strip())

Or, getting the lines even with non-fiction if there is fiction with no non- in front using a bit modified @jlesuffleur's regex:

import re
final_output = []
rx = re.compile(r'\b(?<!non-)fiction\b')
with open(filepath, 'r', newline="\n", encoding="utf8") as f:
  for line in f:
    if rx.search(line):
      final_output.append(line.strip())
  1. Getting lines from a multiline string (with both approaches mentioned above):
import re
text = "Your input string line 1\nLine 2 with fiction\nLine 3 with non-fiction\nLine 4 with fiction and non-fiction"
rx = re.compile(r'\b(?<!non-)fiction\b')
# Approach with regex returning any line containing fiction with no non- prefix:
final_output = [line.strip() for line in text.splitlines() if rx.search(line)]
# => ['Line 2 with fiction']
# Non-regex approach that does not return lines that may contain non-fiction (if they contain fiction with no non- prefix):
final_output = [line.strip() for line in text.splitlines() if "fiction" in line and "non-fiction" not in line]
# => ['Line 2 with fiction', 'Line 4 with fiction and non-fiction']

See a Python demo.

Upvotes: 2

Cute Panda
Cute Panda

Reputation: 1498

Check if this works for you:

.*(?<!non\-)fiction.*

Upvotes: 2

jlesuffleur
jlesuffleur

Reputation: 1253

What about a negative lookbehind?

s = 'fiction non-fiction'
res = re.findall("(?<!non-)fiction", s)
res

Upvotes: 1

Related Questions