nzaleski
nzaleski

Reputation: 441

Negative Lookahead RegEx

Why is this not working? Trying to do a negative lookahead. I am trying to pull the numbers from the bins, except in quarantine bin and inspection bin. When I do the code with the ^ in the front is matches all numbers in parenthesis. When I remove the ^ it matches nothing.

Also can you use the "or operator |" inside the negative lookahead? I want to have ^(?!Quarantine_Bin|Inspection_Bin)

I also tried to specifically negate [^Quarantine_Bin] and it is still matching.

^(?!Quarantine_Bin)\([0-9]+\)

Data

    Quarantine(2),Other_Bin(2),Quarantine_Bin(2),Quarantine_Bin(2),
    Quarantine_Bin(5),Inspection_Bin(3),Regular_Bin(5),other(2)

Upvotes: 3

Views: 1009

Answers (4)

zdim
zdim

Reputation: 66881

It's a negative lookbehind

use warnings;
use feature 'say';

my @strings = (
    "Quarantine_Bin(5),Inspection_Bin(3),Regular_Bin(5),other(2)",
    "Quarantine(2),Other_Bin(2),Quarantine_Bin(2),Quarantine_Bin(2),"
);

for (@strings) {
    my @m = $_ =~ /(?<!\b(?:Quarantine|Inspection)_Bin)\(\d+\)/g; 
    say "@m";
} 

The ^ anchor doesn't do what you want here, use \b to specify a word boundary.

This includes the parenthesis with numbers, returning lines (5) (2) and (2) (2).

If you'd rather omit them, add capturing parethesis around numbers

/(?<! \b(?: Quarantine|Inspection)_Bin ) \( (\d+) \)/xg;

or pull the opening paren inside the lookbehind (so it is not consumed) and leave out the closing one

/(?<! \b(?: Quarantine|Inspection)_Bin \( ) \d+/xg;

These return lines 5 2 and 2 2, no parens.

The /x modifier allows spaces inside for readability.

Upvotes: 5

Sobrique
Sobrique

Reputation: 53478

It's in the comment, so I'll flesh it out as an actual answer.

I would suggest generally avoiding lookahead/behind regexes, because it can get complicated and messy. In your use case - I'd probably just split the line into an array, and handle each individually.

Something like:

#!/usr/bin/env perl
use strict;
use warnings;

while ( <DATA> ) { 
    chomp;
    #split on comma;
    #grep out Inspection_Bin and Quarantine_Bin
    my @fields = grep { not m/(?:Quarantine|Inspection)_Bin/ } split /,/;
    #iterate each field, and select out two different regex matches, e.g.
    #word bit and number bit. 
    print m/^(\w+)/, "=>", m/\((\d+)\)/, "\n" for @fields;
}


__DATA__
Quarantine(2),Other_Bin(2),Quarantine_Bin(2),Quarantine_Bin(2),Quarantine_Bin(5),Inspection_Bin(3),Regular_Bin(5),other(2)

Upvotes: 1

ikegami
ikegami

Reputation: 385857

^(?!Quarantine_Bin)\([0-9]+\) checks if the start of the string isn't followed by Quarantine_Bin but is followed by \([0-9]\). That can never be true.

[^Quarantine_Bin] matches a single character that isn't B, Q, a, e, i, n, r, t, u or _. Not what you want.


Without the filtering, you'd have

\b\w+\([0-9]+\)

You want to ensure the \b isn't followed by Quarantine_Bin or Inspection_Bin, so you can use

\b(?!Quarantine_Bin\b)(?!Inspection_Bin\b)\w+\([0-9]+\)

or

\b(?!(?:Quarantine|Inspection)_Bin\b)\w+\([0-9]+\)

The \b within the lookahead prevents Quarantine_Bin_X from being filtered out.


Useful:

(?:(?!STRING).)* is to STRING as [^CHAR]* is to CHAR.

Upvotes: 2

anubhava
anubhava

Reputation: 785196

You should be using a negative lookbehind as:

(?<!\b(Quarantine|Inspection)_Bin)\([0-9]+\)

RegEx Demo

(?<!\b(Quarantine|Inspection)_Bin) is a negative lookbehind that asserts failure if there is Quarantine_Bin or Inspection_Bin before our match. \b is for word boundary.

Upvotes: 4

Related Questions