Reputation: 387

How can I extract a string between two characters + do it RECURSIVELY?

I have a string:

123 + FOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123

Now, I need to check that the FOO1 shows along with the e_. That is, there can't be situation like this:

123 + FOO1[ccc + e_FOK1 ...]

My simple question is how can I tell Perl to catch the FOO1 word for example ?

I thought to search between 2 characters: " " and "["

and then check if it is written correctly after " e_" between the "[..]" for example.

HOW CAN I DO IT RECURSIVELY ?

Upvotes: 0

Answers (4)

Ether

Reputation: 53986

Based on some of your comments, I'm going to assume that your question is "between the '[' and ']' brackets, ensure that any 'e_' symbol is 'e_FOO' and not something else...

(Edit: okay, it appears like you need the "FOO" keyword to also appear just before the square brackets.. I'll revise the regex accordingly.)

if ($line =~ /
              ([A-Z]+)  # match a keyword in all caps, and save it for later
                        # (we can retrieve it with \1 or $1)
              \[        # match the first [
                [\]]*   # some number of any character that isn't ]
                e_      # a ha, here's our e_
                \1      # and here's our keyword that we matched earlier
                [\]]*   # some more of any character that isn't ]
              \]        # here's our closing ]
             /x)
{
     say "Good data";
}
else
{
     say "Bad data";
}

But please, start reading some of the tutorials in perldoc perlre.

Upvotes: 1

ghostdog74

Reputation: 342739

since you said "I need to confirm that FOO1 is followed the "e_" string that inside its brackets", you just need to check for e_FOO1, right? no need for too complicated regex.

my $str="123 + FOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123";
my $s = index($str,"[");
my $e = index($str,"]");
my $f = index($str,"e_FOO1");
if ( $f >=$s and $f <= $e ){
    print "found \n";
}

Upvotes: 0

FMc

Reputation: 42421

If your situation is more complex than you've described, this code won't work (for example, it does nothing to ensure than your left and right brackets balance each other). However, the code does illustrate how to use back-references (see \1 below), which might get you on the right track.

use strict;
use warnings;

while (<DATA>){
    warn "Bad line: $_" unless / (\w+) \[ .* e_\1 .* \] /x;
}

__DATA__
123 + FOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123
123 + FOO1[ccc + e_FOOx + ddd + FOO2[b_FOO2]] = 123

Upvotes: 0

Sinan Ünür

Reputation: 118158

You need to write a parser for your mini-language: See Parse::RecDescent. The calculator demo would be a good starting place.

#!/usr/bin/perl

use strict;
use warnings;

my ($expr) = @ARGV;

my @tokens = split //, $expr;

my ($word, $inside) = (q{}, 0);

for my $token (@tokens) {
    $token =~ /\A\w\z/ and do { $word .= $token; next };

    if ( $inside ) {
        if ( $word =~ /FOO1/ ) {
            $word eq 'e_FOO1'
                or die "No FOO1 w/o e_ prefix allowed!\n"
        }
    }
    else {
        $word !~ /FOO1/
            or die "No FOO1 allowed!\n";
    }

    $token eq '[' and ++$inside;
    $token eq ']' and --$inside;
    $word = '';
}

C:\Temp> t.pl "123 + MOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123"

C:\Temp> t.pl "123 + FOO1[ccc + e_FOO1 + ddd + FOO2[b_FOO2]] = 123"
No FOO1 allowed!

C:\Temp> t.pl "123 + MOO1[ccc + FOO1 + ddd + FOO2[b_FOO2]] = 123"
No FOO1 w/o e_ prefix allowed!

Upvotes: 2

How can I extract a string between two characters + do it RECURSIVELY?

Answers (4)

Related Questions