Mazze
Mazze

Reputation: 197

Perl: How get multiple regex captures in a structured way?

I am trying to get all occurences of a group of patterns in an arbitrary string, much like this:

my $STRING = "I have a blue cat. That cat is nice, but also quite old. She is always bored.";

foreach (my @STOPS = $STRING =~ m/(?<FINAL_WORD>\w+)\.\s*(?<FIRST_WORD>\w+)/g ) {

  print Dumper \%+, \@STOPS;
}

But the outcome is not what I expected, and I don't fully understand why:

$VAR1 = {
          'FINAL_WORD' => 'old',
          'FIRST_WORD' => 'She'
        };
$VAR2 = [
          'cat',
          'That',
          'old',
          'She'
        ];
$VAR1 = {
          'FINAL_WORD' => 'old',
          'FIRST_WORD' => 'She'
        };
$VAR2 = [
          'cat',
          'That',
          'old',
          'She'
        ];
$VAR1 = {
          'FINAL_WORD' => 'old',
          'FIRST_WORD' => 'She'
        };
$VAR2 = [
          'cat',
          'That',
          'old',
          'She'
        ];
$VAR1 = {
          'FINAL_WORD' => 'old',
          'FIRST_WORD' => 'She'
        };
$VAR2 = [
          'cat',
          'That',
          'old',
          'She'
        ];

If there is no better solution I could live with what is in @STOPS in the end and omit the loop. But I would prefer to get every pair of matches separately, and I don't see a way.

But why then is the loop executed multiple times anyway?

Thank you in advance, and Regards,

Mazze

Upvotes: 2

Views: 132

Answers (1)

H&#229;kon H&#230;gland
H&#229;kon H&#230;gland

Reputation: 40778

You need to use a while loop not a for loop:

while ($STRING =~ m/(?<FINAL_WORD>\w+)\.\s*(?<FIRST_WORD>\w+)/g ) {
    print Dumper \%+;
}

Output:

$VAR1 = {
          'FIRST_WORD' => 'That',
          'FINAL_WORD' => 'cat'
        };
$VAR1 = {
          'FIRST_WORD' => 'She',
          'FINAL_WORD' => 'old'
        };

The for loop gathers all the matches at once in @STOPS and %+ is set to the last global match. The while loop allows you to iterate through each global match separately.

According to perldoc perlretut:

The modifier /g stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have /g jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the pos() function.

Upvotes: 6

Related Questions