jdamae
jdamae

Reputation: 3909

perl regex - multiple pattern matching, optional matching

I'm stuck on this regex. Its matching 2 of my 3 file names. Need help getting all three if possible. I also want to extract one of these values abc|def|ghi as well as ucsb|tech locale name before the extension .edu | .net into variables.

Would like to do this in one pass if possible. Thanks.

/home/test/abc/.last_run_dir
/home/test/def/[email protected]
/home/test/ghi/.last_file_sent.dp3.tech.net

Its not picking up the first line:

/home/test/abc/.last_run_dir

Regex:

$line =~ m#home/test/(\w{3}).*[.](\w+)[.].*#

Code:

my $file = 'Index.lst';
open my $FILE, '<', $file or die "unable to open '$file' for reading: $!";
while (my $line = <$FILE>) {
    chomp($line);
    if ($line =~ m#home/test/(\w{3}).*[.](\w+)[.].*#) {
        open my $file2, '<', $line or die "unable to open '$file' for reading: $!";
        while(my $line2 = <$file2>) {
        print "$line2";
        }
        close $file2;
    }
} #end while
close $FILE;

Also, how do I print out my possible matches? If they are optional?

Upvotes: 2

Views: 4175

Answers (3)

bourbaki
bourbaki

Reputation:

You could do something like:

#!/usr/bin/perl
use strict;
use warnings;

while(my $line=<DATA>) {
    chomp($line);
    if ($line =~ m#home/test/(\w{3})/\.(\w+)(?:.*\.(\w+)\.[^.]+)?|$#) {
        print "$line\n";
        print "1=$1\t2=$2\t3=$3\n";
    }
}

__DATA__
/home/test/abc/.last_run_dir
/home/test/def/[email protected]
/home/test/ghi/.last_file_sent.dp3.tech.net

Ouput:

/home/test/abc/.last_run_dir
1=abc   2=last_run_dir  3=
/home/test/def/[email protected]
1=def   2=last_file_sent    3=ucsb
/home/test/ghi/.last_file_sent.dp3.tech.net
1=ghi   2=last_file_sent    3=tech

Upvotes: 4

Dogweather
Dogweather

Reputation: 16769

The part of your regex after the w{3} forces it to look for the next dot-word-dot:

[.](\w+)[.].*

A simple fix is to make this optional. But when you do, you'll probably need to lock down that first .*: specify that it can be any string of characters, but not a period. (A good practice to do in general, btw.)

$line =~ m#home/test/(\w{3})[^.]*([.](\w+)[.].*)?#

EDIT: I see that my solution might need a bit of testing to check for the periods in the right places, fyi.

Upvotes: 3

jxstanford
jxstanford

Reputation: 3387

Your regex requires two instances of "." to match. If the second one is optional use a [.]?

$line =~ m#home/test/(\w{3}).*[.](\w+)[.]?.*#;

Upvotes: -1

Related Questions