Reputation: 83
In this string:
"<0> <<1>> <2>> <3> <4>"
I want to match all instances of "<\d{1,2}>" except those I have escaped with an extra set of triangle brackets, e.g., I want to match 0,2,3,4 but not 1, e.g.:
"<0> <<1>> <2>> <3> <4>"
I want to do this in one single regular expression but the best I could get is:
(^|[^\<])\<(?<1>\d{1,2})>([^>]|$)
Which will match 0,3,4 but not 2, e.g.:
"<0> <<1>> <2>> <3> <4>"
Does anyone know how this can be done with a single regular expression?
Upvotes: 4
Views: 319
Reputation: 34120
Perl
.use strict;
use warnings;
my $str = "<0> <<1>> <2>> <3> <4>";
my @array = grep {defined $_} $str =~ /<<\d+>>|<(\d+)>/g;
print join( ', ', @array ), "\n";
Upvotes: 0
Reputation: 75222
In case you're using a regex flavor (like Java's) that supports lookarounds but not conditionals, here's another approach:
(?=(<\d{1,2}>))(?!(?<=<)\1(?=>))\1
The first lookahead ensures that you're at the beginning of a tag and captures it for later use. The subexpression in the second lookahead matches the tag again, but only if it's preceded by a <
and followed by a >
. Making it a negative lookahead achieves the NOT(x AND y) semantics you're looking for. Finally, the second \1
matches the tag again, this time for real (i.e., not in a lookaround).
BTW, I could have used >
instead of (?=>)
in the second lookahead, but I think this way is easier to read and expresses my intent better.
Upvotes: 0
Reputation: 7831
Presuming that with the input set
"<0> <<1>> <2>> <3> <4><<5>"
we want to match 0, 2, 3, 4 and 5.
The problem is that you need to use zero-width look-ahead and zero-width look-behind, but there are three cases to match, '<', '>' and '', and one not to match '<>'. Also if you want to be able to extract the marked expressions so that you can assign the match to an array, you need to avoid marking things you don't need. So I ended up with the non-elegant
use Data::Dumper;
my $a = "<0> <<1>> <2>> <3> <4><<5>";
my $brace_pair = qr/<[^<>]+>/;
my @matches = $a =~ /(?:(?<!<)$brace_pair(?!>))|(?:$brace_pair(?!>))|(?:(?<!<)$brace_pair)/g;
print Dumper(\@a);
If you wanted to cram this into a single expression - you could.
Upvotes: 1
Reputation: 164639
Here's an alternative to a single regex. Split it into a list at the ><
boundary and then just exclude <...>
.
#!/usr/bin/perl -lw
$s = "<0> <<1>> <2>> <3> <4>";
print join " ",
map { /(\d+)/; $1 }
grep !/^<.*>$/,
split />\s*</, $s;
Upvotes: 0
Reputation: 7378
You can also try conditionals:
(?(?<=<)(<\d{1,2}>(?!>))|(<\d{1,2}>))
Upvotes: 5
Reputation: 545508
You can look a negative look-behind zero-width assertion:
(?<!<)<\d{1,2}>
Upvotes: 2