Reputation: 23824
The following expression
print Dumper "a.b.c.d" =~ /(.)(?:\.(.))*/
prints
$VAR1 = 'a';
$VAR2 = 'd';
Where are b
and c
? How to get them?
Update: Further simplification
The following expression
print Dumper "abcd" =~ /(.)+/
captures only the last character:
$VAR1 = 'd';
How to capture all characters?
Another update for those who do not believe me that I can neither use the global modifier nor split.
I would like to capture all digits in the following example.
print Dumper "x1234y" =~ /(\d)+/
The expression captures just the last digit:
$VAR1 = '4';
What I want is this:
@digits = ();
"x1234y" =~ /^x(:?(\d)(?{push @digits, $^N}))+y$/;
print Dumper @digits;
Which captures all digits:
$VAR1 = '1';
$VAR2 = '2';
$VAR3 = '3';
$VAR4 = '4';
The Perl documentation marks the code embedding feature with the following warning:
Using this feature safely requires that you understand its limitations.
And because I am not sure, whether I understand the limitations, I though there might be a simpler way to get the same. That is what this question is about.
Upvotes: 2
Views: 123
Reputation:
I don't think there's a way to get a single capture group inside a repetition operator to produce multiple output strings. And you still haven't shown an example that isn't handled perfectly well by a simple capture of the whole sequence and then split afterward.
The (?{...})
solution isn't bad, especially in your example where the string is anchored at both ends, so there's no chance of it succeeding with any backtracking.
To work in a more general case, you should initialize the array just before the repetition starts, so you don't get leftovers from partial matches, like this:
@digits=();
"x8w x9z x1234y" =~ /x(?:(\d)(?{push @digits, $^N}))+y/;
# bad, @digits=(8,9,1,2,3,4)
"x88w x9z x1234y" =~ /x(?{@digits=()})(?:(\d)(?{push @digits, $^N}))+y/;
# better, @digits=(1,2,3,4)
Upvotes: 1
Reputation: 53478
They're not there because you have a *
outside your capture group, and you're not repeating the pattern.
Thus what happens in the regex engine is:
Compiling REx "(.)(?:\.(.))*"
Final program:
1: OPEN1 (3)
3: REG_ANY (4)
4: CLOSE1 (6)
6: CURLYX[1] {0,32767} (16)
8: EXACT <.> (10)
10: OPEN2 (12)
12: REG_ANY (13)
13: CLOSE2 (15)
15: WHILEM[1/1] (0)
16: NOTHING (17)
17: END (0)
minlen 1
Matching REx "(.)(?:\.(.))*" against "a.b.c.d"
0 <> <a.b.c.d> | 1:OPEN1(3)
0 <> <a.b.c.d> | 3:REG_ANY(4)
1 <a> <.b.c.d> | 4:CLOSE1(6)
1 <a> <.b.c.d> | 6:CURLYX[1] {0,32767}(16)
1 <a> <.b.c.d> | 15: WHILEM[1/1](0)
whilem: matched 0 out of 0..32767
1 <a> <.b.c.d> | 8: EXACT <.>(10)
2 <a.> <b.c.d> | 10: OPEN2(12)
2 <a.> <b.c.d> | 12: REG_ANY(13)
3 <a.b> <.c.d> | 13: CLOSE2(15)
3 <a.b> <.c.d> | 15: WHILEM[1/1](0)
whilem: matched 1 out of 0..32767
3 <a.b> <.c.d> | 8: EXACT <.>(10)
4 <a.b.> <c.d> | 10: OPEN2(12)
4 <a.b.> <c.d> | 12: REG_ANY(13)
5 <a.b.c> <.d> | 13: CLOSE2(15)
5 <a.b.c> <.d> | 15: WHILEM[1/1](0)
whilem: matched 2 out of 0..32767
5 <a.b.c> <.d> | 8: EXACT <.>(10)
6 <a.b.c.> <d> | 10: OPEN2(12)
6 <a.b.c.> <d> | 12: REG_ANY(13)
7 <a.b.c.d> <> | 13: CLOSE2(15)
7 <a.b.c.d> <> | 15: WHILEM[1/1](0)
whilem: matched 3 out of 0..32767
7 <a.b.c.d> <> | 8: EXACT <.>(10)
failed...
whilem: failed, trying continuation...
7 <a.b.c.d> <> | 16: NOTHING(17)
7 <a.b.c.d> <> | 17: END(0)
Match successful!
Your non capturing bracket (?:
is redundant as well.
How about:
print Dumper "a.b.c.d" =~ /(.)\.(.)/g;
Which prints:
$VAR1 = 'a';
$VAR2 = 'b';
$VAR3 = 'c';
$VAR4 = 'd';
Or alternatively (it wasn't entirely clear what you were seeking):
print Dumper "a.b.c.d" =~ /(.)\.(.+)/;
$VAR1 = 'a';
$VAR2 = 'b.c.d';
Update: Further simplification
print Dumper "abcd" =~ /(.)+/
How to capture all characters?
Again - the problem is - you're capturing a single character in your brackets, and then using +
to alter how much of the pattern is 'consumed'. So you will only ever get a single character here, because you told it to.
If you want all of them, put the plus inside the brackets and you will get a single scalar.
print Dumper "abcd" =~ /(.+)/;
If you want each of them as separate elements:
print Dumper "abcd" =~ /(.)/g;
Repeat the capture operation. Or use split
:
print Dumper split //, "abcd";
Edit: Following your last edit:
"x1234y" =~ /^x(:?(\d)(?{push @digits, $^N}))+y$/;
This still doesn't illustrate the the question you're asking properly, because this works:
my @digits = "x1234y" =~ m/(\d)/g;
print Dumper \@digits;
And if it didn't, then this works:
my @digits = split ( //, "x1234y" =~ s/\D//rg );
Or this:
my @digits = grep { /\d/ } split //, "x1234y";
Upvotes: 1