Reputation: 525
I have a file where the lines are made up of fields that are:
An example line:
%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff
Sets of tags are significant for searching -- This is the tag set for my example:
%t, %u, %v, %w, %x, %xx, %y, %z
I want to find the content of fields where the tag is in the set and the field content is repeated in a subsequent field tagged from the set. Here is the code of my unsuccessful attempt:
my $tagmrkr='%';
my $line='%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff';
my $searchtags = qr/t|u|v|w|x|xx|y|z/; # excludes q
print qq/The line:$line\n\n/;
for ($line =~ m/
$tagmrkr$searchtags\ ([^\,]*,)
.*?
$tagmrkr$searchtags\ \1
/gx) {
print qq/First field contents:$1\n/;
print qq/Entire match:$&\n/;
print qq/\n/;
}
I was expecting:
The line:%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff
First field contents:this,
Entire match:%t this,%u that,%v this,
First field contents:the other,
Entire match:%x the other,%xx only once,%q the other,%z the other,
I got:
The line:%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff
First field contents:the other,
Entire match:%x the other,%xx only once,%q the other,%z the other,
First field contents:the other,
Entire match:%x the other,%xx only once,%q the other,%z the other,
Question 1:
Why is the $1
and $&
for first match being replaced by the values from the second match?
Question 2: -- What should I change to get what I want (below) not what I expect?
What I want is to be able to re-pivot the match so that it also finds the repeated field in spite of overlaps -- where the first field of the second match occurs before the second field of the first match. Actually, for my immediate purposes, all I need is the duplicated field content.
I.e., I want 3 matches from the example:
The line:%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff
First field contents:this
Entire match:%t this,%u that,%v this,
First field contents:that
Entire match:%u that,%v this,%t that,
First field contents:the other
Entire match:%x the other,%xx only once,%q the other,%z the other,
Upvotes: 3
Views: 110
Reputation: 66881
One way to provide for overlaps is to assert the presence of the rest of the phrase, using lookahead. Then that part is not consumed and the engine continues from before it and so it can match it again
use warnings;
use strict;
use feature 'say';
my $s = q(%a astuff,%b bstuff,%t this,%u that,%v this,%t that,)
. q(%x the other,%xx only once,%q the other,%z the other,%c cstuff);
my $m = qr/%/;
my $t = qr/(?:t|u|v|w|x|xx|y|z)/;
while ($s =~ / $m$t \s ([^,]+) , (?=(.*?$m$t\s\g{1},?)) /gx) {
say "capture: $1";
say " whole: $1,$2";
}
For a more detailed explanation of how the lookahead helps in catching overlapping patterns see this post
Prints
capture: this whole: this,%u that,%v this, capture: that whole: that,%v this,%t that, capture: the other whole: the other,%xx only once,%q the other,%z the other,
Upvotes: 2
Reputation: 40758
Using a global match in a for
loop will return all matches at once (and then iterates over the matches), hence the match variables will be set to the last successful match (before starting the iteration), whereas using the global regexp match in a while condition evaluates it in scalar context such that the match variables will be correct for each iteration.
You can get all three matches by resetting pos $line
for each iteration. E.g. using the following approach:
while ($line =~ m/
$tagmrkr$searchtags\ ([^\,]*,)
.*?
$tagmrkr$searchtags\ \1
/gx) {
pos $line = $-[0] + 1;
print qq/First field contents:$1\n/;
print qq/Entire match:$&\n/;
print qq/\n/;
}
Output:
The line:%a astuff,%b bstuff,%t this,%u that,%v this,%t that,%x the other,%xx only once,%q the other,%z the other,%c cstuff
First field contents:this,
Entire match:%t this,%u that,%v this,
First field contents:that,
Entire match:%u that,%v this,%t that,
First field contents:the other,
Entire match:%x the other,%xx only once,%q the other,%z the other,
Upvotes: 0