nowox
nowox

Reputation: 29086

Perl regex forward reference

I would like to match a forward reference with regexp. The pattern I am looking for is

[snake-case prefix]_[snake-case words] [same snake-case prefix]_number

For example:

foo_bar_eighty_twelve foo_bar_8012

I cannot extract foo_bar and eighty_twelve without looking first at foo_bar_8012. Thus I need a forward reference, not a backward reference which work only if my prefix is not a snake-case prefix.

my $prefix = "foo";
local $_ = "${prefix}_thirty_two = ${prefix}_32";

# Backward reference that works with a prefix with no underscores
{
    /(\w+)_(\w+) \s+ = \s+ \1_(\d+)/ix;
    print "Name: $2 \t Number: $3\n";
}

# Wanted Forward reference that do not work :(
{
    /\2_(\w+) \s+ = \s+ (\w+)_\d+/ix;
    print "Name: $1 \t Number: $2\n";
}

Unfortunately, my forward reference does not work and I do not know why. I've read that Perl support that kind of patterns.

Any help ?

Upvotes: 0

Views: 1279

Answers (2)

Miller
Miller

Reputation: 35198

The following assumption is false:

“I cannot extract foo_bar and eighty_twelve without looking first at foo_bar_8012.”

Yes, it is true that you can't definitely determine where the break in prefix and name occur in the first group of characters until looking at the second group, but thus comes the power of regular expressions. It greedily matches on the first pass, finds the second string doesn't match, and then backtracks to try again with a smaller string for the prefix.

The following demonstrates how you would accomplish your goal using simple back references:

use strict;
use warnings;

while (<DATA>) {
    if (m{\b(\w+)_(\w+)\s+\1_(\d+)\b}) {
        print "Prefix = $1, Name = $2, Number = $3\n";
    } else {
        warn "Not found: $_"
    }
}
__DATA__
foo_thirty_two foo_32
foo_bar_eighty_twelve foo_bar_8012

Outputs:

Prefix = foo, Name = thirty_two, Number = 32
Prefix = foo_bar, Name = eighty_twelve, Number = 8012

Upvotes: 2

thelogix
thelogix

Reputation: 610

AFAIK Forward referencing is not a magic bullet that allows to to swap capture-group and reference.

I've look at quite a bit of examples and i simply dont think you can do what you're trying, using forward referencing.

I solved the issue by using back-referencing combined with look-ahead. Like so:

/(?=.*=\s*([a-z]+))\1_(\w+) \s+ = \s+ \w+_\d+/ix

This works because the look-ahead initializes the first capture group ahead of the "actual" expression. For reference, this part is the look-ahead:

(?=.*=\s*([a-z]+))

and its basically just sort of a "sub-regex". The reason i use [a-z]+, is because \w+ includes underscore. And i don't think that was what you wanted.

Upvotes: 0

Related Questions