deep
deep

Reputation: 716

Perl Regex Extraction

I have a file which contains the details as follows:

/var/example/12.1.1.0-gn/product
/var/example/12.1.1.0-xn/product
              .
              .
/var/example/13.1.1.0-gn/product
/var/example/13.1.1.0-xn/product

I would like to use the above paths and insert the new variable such that:

/var/example/12.1.1.0/12.1.1.0-gn/product
/var/example/12.1.1.0/12.1.1.0-xn/product
              .
              .
/var/example/13.1.1.0/13.1.1.0-gn/product
/var/example/13.1.1.0/13.1.1.0-xn/product

I have written the below script for it:

where $new_add represents the added part in the new part. I am trying to do it via regex to generalize the script. I am a newbie to perl so please guide me if I am wrong somewhere. Thank you.

open (FH) or dir ("Could not open the file");
foreach $line (<FH>){
     ($a, $b, $c, $d, $e, $f) = split ('/', $line);
      chomp ($line);
      print "$a, $b, $c, $d $e $f\n";
      if ($e =~ m/^\d.\d.\d.\d-\d+/){
          $new_add = $e;
          print "Match";
      }
 }

Upvotes: 1

Views: 548

Answers (3)

bonsaiviking
bonsaiviking

Reputation: 6005

Your Perl style is based on Perl 4. Adopting some better practices will make your Perl-writing life much easier. First, a quick solution to your problem:

#!/usr/bin/perl -np
use strict;
use warnings;
s{/(\d+\.\d+\.\d+\.\d+)-}{/$1/$1-};

This will match your 4-part version string, capturing it and making it another element in your directory path. Now, to address your script and show you some better Perl:

First, always always ALWAYS start your script with use strict; use warnings;. This will enforce some stricter interpretation of your script, which is great, since Perl will usually assume that it knows what you want, and do whatever possible to avoid causing an error. The most visible thing that use strict; does is force lexical scoping, which means that you must declare your variables with my.

So your first line (after use strict; use warnings;) is:

open (FH) or dir ("Could not open the file");

Perl will now complain about a few things. First, file handles are variables! So we need to declare them like so: my $fh. Stick with lower-case variable names; it's more readable. Perl also doesn't like that bareword dir. I think you meant die, which is a keyword:

open my $fh or die "Could not open the file";

Ok, so we eliminated some unnecessary parentheses, making the line much more readable. But now the file is never able to open. This is because you haven't provided a file name! There are many ways to use open, but the best one for most purposes is the 3-argument form. The arguments are: filehandle, mode, and filename. In this case, we want to read from the file, so mode is "<":

open my $fh, "<", "test.txt" or die "Could not open the file";

This would be a good time to point out that you can leave the error handling up to Perl by including use autodie; at the top of your script. Now your script looks like this:

#!/usr/bin/perl

use strict; 
use warnings;
use autodie;

open my $fh, "<", "test.txt";

foreach my $line (<$fh>){

Now, foreach is a synonym for for, which I prefer because it saves some typing. $line was declared lexically (my), and the diamond operator (<>) now surrounds our lexical filehandle $fh. Unfortunately, this pulls the entire file into memory, which could be problematic. If we use a while loop instead, then each line is stored, processed, and discarded as we pass through the loop:

while (my $line = <$fh>) {
    ($a, $b, $c, $d, $e, $f) = split ('/', $line);

Now look at this! Lots of variables that need to be lexically scoped. One way would be to use a single my declaration for all of them: my ($a, $b, $c, $d, $e, $f). A better idea would be to notice that we have a series of items that are alike. This could probably be better written with an array:

my @path = split '/', $line;

There, that's nice! Now I'm not sure why you decided to chomp the line next; it doesn't make sense, since you don't use $line after this, so we'll skip it. The next line has to be modified to use our new @path variable:

print join(", ", @path), "\n";

Using join means we don't have to know how many elements we split the line into. We also see (from this output) that the fourth element (index 3) of @path is the one with the version string we want to match, but the regular expression is a little off.

if ($path[3] =~ m/^\d.\d.\d.\d-\d+/){

This is looking for a series of single digits separated by any character, and followed by more digits after a "-". Your example shows that some of these should be multiple digits, and we should be matching literal "." (period, full stop) instead of regex "." (any character), and the last part can be letters ("xn", "gn", etc.). Here's a regex to match that:

if ($path[3] =~ m/^(\d+\.\d+\.\d+\.\d+)-../){

You'll notice we added + to mean "one or more" and \ to escape the . characters. One more thing, we added grouping parentheses () to capture the version string, separate from the rest of the string, since that's what you want as a directory name. This capture will be stored in the $1 variable, so the next line is now:

my $new_add = $1;

And that's about it. Obviously, you'll have more work to finish up your script, but hopefully I've given you some tools to make your Perl experience better. And if all you wanted was a quick solution, that's way up there at the top.

If you want to continue programming in Perl, I'd recommend getting a book that teaches Perl 5, preferably one written in the last 5 or 6 years. One I'd highly recommend is Modern Perl, which is also available for free online.

Upvotes: 4

Kenosis
Kenosis

Reputation: 6204

Perhaps the following will be helpful:

use strict;
use warnings;

while (<>) {
    s!(/\d[^-]+)!$1$1!;
    print;
}

Usage: perl script.pl inFile [>outFile]

The second, optional parameter directs output to a file.

Or as a oneliner: perl -p -ne 's!(/\d[^-]+)!$1$1!' inFile [>outFile]

Output on your dataset:

/var/example/12.1.1.0/12.1.1.0-gn/product
/var/example/12.1.1.0/12.1.1.0-xn/product
/var/example/13.1.1.0/13.1.1.0-gn/product
/var/example/13.1.1.0/13.1.1.0-xn/product

Upvotes: 3

perreal
perreal

Reputation: 98118

use strict;
use warnings;

while (my $line = <>){
    my (@v) = split ('/', $line);
    print join(" ", @v), "\n";
    if (my ($new_add) = $v[-2] =~ m/([^-]*)/){
        print "Match $new_add\n";
    }   
}

Upvotes: 0

Related Questions