Guido Pujadas
Guido Pujadas

Reputation: 33

extracting substring with regex in perl

I have a simple problem (i think) extracting information from an archive in Perl. This archive has 200 000 lines approx and some lines has this format

WO      GB111MTD1                    dddD-51   TIME 141202 0105  PAGE  1

i want to put in a variable GB111MTD1, and i know that always the word "WO" is first.

What i have tried is the following:

open(archive, "C:/Users/g/Desktop/c/alarms.log") or die "blah\n";

while(< archive>){
        if($_ =~ /^WO\s+(.*)/){
            print "Found: $1\n";
            last;
        }
}

this prints me all the line, but i only want "GB111MTD1".

---next intent

while(< archive>){
        if($_ =~ /^WO\s+(.*)\S/){
            print "Found: $1\n";
            last;
        }
}

i want to say here "if the line begins with WO and have some whitespaces, match me what is next until other whitespace is found"

here, the only difference is that the "1" of WO GB111MTD1 dddD-51 TIME 141202 0105 PAGE 1 is not shown but still is not what i want

i hope you understand my problem.

Upvotes: 1

Views: 112

Answers (2)

Len Jaffe
Len Jaffe

Reputation: 3484

I would use split on lines that start with WO.

 use warnings;
 use strict;

 while (<DATA>) {
     if (/^WO\s/) {
         my @fields = split(/\s+/);
         my $myvar = $fields[1];
         # do stuff with $myvar
         say "Frobnicating order # $myvar";
     }
 }

 __DATA__
 WO GB111MTD1 dddD-51 TIME 141202 0105 PAGE 1

Upvotes: 1

toolic
toolic

Reputation: 62037

You can use \S for non-whitespace characters:

use warnings;
use strict;

while (<DATA>) {
    if (/^WO\s+(\S+)/) {
        print "Found: $1\n";
        last;
    }
}

__DATA__
WO GB111MTD1 dddD-51 TIME 141202 0105 PAGE 1

Prints:

Found: GB111MTD1

Upvotes: 4

Related Questions