Mengo
Mengo

Reputation: 1267

Regex: How to not match word A if word B is somewhere before A

I'm using python regex engine and trying to achieve something like for string foo,fou,bar,baz. I want to match baz if and only if fou is not before it. I've tried negative look behind (<?!fou)baz but it doesn't work as it only work for immediate following.

Upvotes: 2

Views: 454

Answers (2)

Jan
Jan

Reputation: 43169

You can even use string methods if you want to avoid the regex overhead.

string = """
foo,fou,bar,baz
foo,baz
baz, fou
neither nor"""

needle = "baz"
matches = [line
    for line in string.split("\n")
    for fou in [line.find('fou')]
    for baz in [line.find(needle)]
    if line and ((fou == -1 and baz > 0) or (baz < fou))]

print(matches)
# ['foo,baz', 'baz, fou']

To save a variable x in a list comprehension, you need to use for x in [...].

Upvotes: 0

heemayl
heemayl

Reputation: 41987

re module does not support variable length lookbehinds, you need to use regex module for that.

To get what you want with re module, you can use negative lookahead to match fou and use a captured group to get baz:

In [15]: str_ = 'foo,fou,bar,baz'

In [16]: re.search(r'^(?!.*fou.*baz).*(baz)', str_)

In [17]: str_ = 'foo,foz,bar,baz'

In [18]: re.search(r'^(?!.*fou.*baz).*(baz)', str_)
Out[18]: <_sre.SRE_Match object; span=(0, 15), match='foo,foz,bar,baz'>

In [19]: re.search(r'^(?!.*fou.*baz).*(baz)', str_).group(1)
Out[19]: 'baz'

In ^(?!.*fou.*baz).*(baz):

  • The zero width negative lookahead, (?!.*fou.*baz), makes sure fou does not come before baz in the whole input

  • .*(baz) puts baz in the only captured group

Upvotes: 2

Related Questions