Reputation: 7938
I have a regex as below:
$regex = qr/(?sx-im:(?sx-im:(?:^|(?<=\n)))(?=(?sx-im:[\ \t]*)(?sx-im:(?:^|(?<=\n))Data\ and\ value)(?sx-im:[\ \t\r]*(?:$|\n))))/;
I am matching it against following text:
$text ="Data and value";
Now I want to get the match-start offset, match-end offset and matched text.
Normally I use @-
, @+
and $&
to get these like below:
if($text =~ m/$regex/)
{
print "START Offset = ".$-[0];
print "END Offsset = ".$+[0];
print "Matched Text = ".$&;
}
In this case, match is successful but I am not able to get correct offsets and matched text. Its just printing 0
as both match-start offset and match-end offset. And its printing empty for matched-text.
I want to understand different components of this regex. Specifically what is this (?sx-im:
, and how to get matched text.
Please don't ask me the reason for such regex or suggest me to change the regex. This is a software generated regex. I have simplified my problem for the sake of question.
Please guide me where to start understanding this regex and get match offsets.
Upvotes: 0
Views: 144
Reputation: 3795
The bug is in your regex, not your understanding of match offsets. It is matching a zero-width string at the start of the string, and correctly reporting start and end offsets of 0.
Now why it matches this is another good question. You can split the regex thus (untested):
qr/(?sx-im:
(?sx-im:(?:^|(?<=\n)))
(?=(?sx-im:[\ \t]*)(?sx-im:(?:^|(?<=\n))Data\ and\ value)(?sx-im:[\ \t\r]*(?:$|\n)))
)/x
And you can see the two sequential halves of it:
\n
- i.e. both are zero-width.You appear to be trying to do too much with a regex, in particular matching the start and end of lines. Consider reading your source file line-by-line and processing individual lines rather than trying to do it all with a regex.
Upvotes: 4
Reputation: 241828
(?: ... )
is a non-capturing group. It does not create a backreference.
Similarly, (?= ... )
is a zero-width look-ahead assertion. It does not include the matching string into $&
.
See Extended Patterns.
Upvotes: 4