Reputation: 441
Why is this not working? Trying to do a negative lookahead. I am trying to pull the numbers from the bins, except in quarantine bin and inspection bin. When I do the code with the ^ in the front is matches all numbers in parenthesis. When I remove the ^ it matches nothing.
Also can you use the "or operator |" inside the negative lookahead? I want to have ^(?!Quarantine_Bin|Inspection_Bin)
I also tried to specifically negate [^Quarantine_Bin]
and it is still matching.
^(?!Quarantine_Bin)\([0-9]+\)
Data
Quarantine(2),Other_Bin(2),Quarantine_Bin(2),Quarantine_Bin(2),
Quarantine_Bin(5),Inspection_Bin(3),Regular_Bin(5),other(2)
Upvotes: 3
Views: 1009
Reputation: 66881
It's a negative lookbehind
use warnings;
use feature 'say';
my @strings = (
"Quarantine_Bin(5),Inspection_Bin(3),Regular_Bin(5),other(2)",
"Quarantine(2),Other_Bin(2),Quarantine_Bin(2),Quarantine_Bin(2),"
);
for (@strings) {
my @m = $_ =~ /(?<!\b(?:Quarantine|Inspection)_Bin)\(\d+\)/g;
say "@m";
}
The ^
anchor doesn't do what you want here, use \b
to specify a word boundary.
This includes the parenthesis with numbers, returning lines (5) (2)
and (2) (2)
.
If you'd rather omit them, add capturing parethesis around numbers
/(?<! \b(?: Quarantine|Inspection)_Bin ) \( (\d+) \)/xg;
or pull the opening paren inside the lookbehind (so it is not consumed) and leave out the closing one
/(?<! \b(?: Quarantine|Inspection)_Bin \( ) \d+/xg;
These return lines 5 2
and 2 2
, no parens.
The /x
modifier allows spaces inside for readability.
Upvotes: 5
Reputation: 53478
It's in the comment, so I'll flesh it out as an actual answer.
I would suggest generally avoiding lookahead/behind regexes, because it can get complicated and messy. In your use case - I'd probably just split
the line into an array, and handle each individually.
Something like:
#!/usr/bin/env perl
use strict;
use warnings;
while ( <DATA> ) {
chomp;
#split on comma;
#grep out Inspection_Bin and Quarantine_Bin
my @fields = grep { not m/(?:Quarantine|Inspection)_Bin/ } split /,/;
#iterate each field, and select out two different regex matches, e.g.
#word bit and number bit.
print m/^(\w+)/, "=>", m/\((\d+)\)/, "\n" for @fields;
}
__DATA__
Quarantine(2),Other_Bin(2),Quarantine_Bin(2),Quarantine_Bin(2),Quarantine_Bin(5),Inspection_Bin(3),Regular_Bin(5),other(2)
Upvotes: 1
Reputation: 385857
^(?!Quarantine_Bin)\([0-9]+\)
checks if the start of the string isn't followed by Quarantine_Bin
but is followed by \([0-9]\)
. That can never be true.
[^Quarantine_Bin]
matches a single character that isn't B
, Q
, a
, e
, i
, n
, r
, t
, u
or _
. Not what you want.
Without the filtering, you'd have
\b\w+\([0-9]+\)
You want to ensure the \b
isn't followed by Quarantine_Bin
or Inspection_Bin
, so you can use
\b(?!Quarantine_Bin\b)(?!Inspection_Bin\b)\w+\([0-9]+\)
or
\b(?!(?:Quarantine|Inspection)_Bin\b)\w+\([0-9]+\)
The \b
within the lookahead prevents Quarantine_Bin_X
from being filtered out.
Useful:
(?:(?!STRING).)*
is to STRING
as [^CHAR]*
is to CHAR
.
Upvotes: 2
Reputation: 785196
You should be using a negative lookbehind as:
(?<!\b(Quarantine|Inspection)_Bin)\([0-9]+\)
(?<!\b(Quarantine|Inspection)_Bin)
is a negative lookbehind that asserts failure if there is Quarantine_Bin
or Inspection_Bin
before our match.
\b
is for word boundary.
Upvotes: 4