Reputation: 113
i'm having trouble regex grouping in perl.
of course this is a much larger problem but it's the same concept i'm dealing with. thank you all in advance for your comments and ideas.
the regex below should only care about this portion of the string to make a decision.
doctor_who:ee
doctor_who:ep
doctor_who:ex
but not
doctor_who:eeh
code:
$str = "doctor_who:ee123ABC451234.123"; #match
$str = "doctor_who:ep123YXZ451234.123"; #match
$str = "doctor_who:ex123451234.123"; #match
$str = "doctor_who:eeh1234LMNOP51234.123"; ##should not match
$str = "doctor_who:abc12341234.123"; ##should not match
$regex = "doctor_who:e[e|p|x]"; #--->problem, what to add/remove?
if ($str =~ m/$regex/){
print "match!";
}
else {
print "not matched\n";
}
Upvotes: 0
Views: 191
Reputation: 9306
It's trivial with a negative lookahead zero-width assertion. This assumes the only thing you don't want to match specifically is doctor_who:eeh*
:
/doctor_who:e(?!eh)[epx]/
In the above example, as long as we match doctor_who:e
, we will trigger the lookahead on every match. We can gain efficiency by only using it if absolutely necessary, as noted in the comments by @ikegami:
/doctor_who:e(?:[px]|e(?!h))/
What that'll do is put off the lookahead unless the second char after the :
is not p
or x
, and then only if that char is e
.
The second example shown in the comments doesn't use lookarounds at all:
/doctor_who:e(?:[px]|e[^h])/
Upvotes: 2
Reputation: 33631
You can do this two ways and you combined the two.
Use a character class:
$regex = 'doctor_who:e[epx]';
Use an alternation:
$regex = 'doctor_who:e(e|p|x)';
These provide the positive match, but they need something else to reject the eeh
match.
But, is eeh
the only match to reject? That's the only one you mentioned, but, would you prefer something more general, such as any a-z character? (e.g.) What about eec
? Should it match or be rejected?
From the examples, ee1
matches. Is that because 1
is not h
or because it's a digit?
It isn't totally clear [to me, at least] what the best, most succinct regex should be, because there are some loopholes in the examples. So, here are some regexes based on assumptions I've made as to what you'd really like.
So, if eeh
is the only rejection, add:
$regex .= '[^h]';
If you'd like a broader rejection:
$regex .= '[^a-z]';
Or, perhaps, you'd only like to match on numeric:
$regex .= '[0-9]';
Side note: This answer has been edited to reflect the comments below
Upvotes: 1
Reputation: 33
Since you are not matching at the end of the string I think you would need two regexes.
$regex = "doctor_who:e[epx]"; # match
$not_regex "doctor_who:e[epx][a-z]"; #-do not match
Then just do
if( $string =~ $regex and $string !~ $not_regex ){}
Upvotes: 1