user4409082
user4409082

Reputation: 103

regex in perl script

I have a file with many lines from the following format

 00000000000 00000000 0 MMM_WR  0            000004            00000abc
 00000000000 00000000 0 MMM_WR  0            000008            0000000c
...

I want to extract the last 2 words into variables in my perl script and print them

I tried

$line =~ m/^.+  MMM_WR  0\s+(\w+)\s+(\w+)/;

 print $1;
 print $2;

but $1 and $2 are always uninitialized

any advise? please

Upvotes: 0

Views: 64

Answers (4)

Polar Bear
Polar Bear

Reputation: 6798

You can split the string into array and print two last elements

use strict;
use warnings;
use feature 'say';

while(<DATA>) {
    my @data = (split ' ')[5,6];
    say join ' ', @data;
}

__DATA__
 00000000000 00000000 0 MMM_WR  0            000004            00000abc
 00000000000 00000000 0 MMM_WR  0            000008            0000000c

Output

000004 00000abc
000008 0000000c

Other variation with regex match

use strict;
use warnings;
use feature 'say';

while (<DATA>) {
    my @data;
    @data = $_ =~ /MMM_WR\s+\d\s+(\d{6})\s+(.+)/;
    say join ' ', @data;
}

__DATA__
 00000000000 00000000 0 MMM_WR  0            000004            00000abc
 00000000000 00000000 0 MMM_WR  0            000008            0000000c

Output

000004 00000abc
000008 0000000c

The data of interest can be extracted with unpack

use strict;
use warnings;
use feature 'say';

while (<DATA>) {
    my @data = (unpack("A45A6A12A8",$_))[1,3];
    say join ' ', @data;
}

__DATA__
 00000000000 00000000 0 MMM_WR  0            000004            00000abc
 00000000000 00000000 0 MMM_WR  0            000008            0000000c

Output

000004 00000abc
000008 0000000c

Variation by utilizing 'substr'

use strict;
use warnings;
use feature 'say';

while (<DATA>) {
    my @data;
    $data[0] = substr $_, 45, 6;
    $data[1] = substr $_, 63, 8;
    say join ' ', @data;
}

__DATA__
 00000000000 00000000 0 MMM_WR  0            000004            00000abc
 00000000000 00000000 0 MMM_WR  0            000008            0000000c

Output

000004 00000abc
000008 0000000c

Upvotes: 0

Schwern
Schwern

Reputation: 164639

You have two spaces before MMM_WR where you should have one.

$line =~ m/^.+ MMM_WR  0\s+(\w+)\s+(\w+)/;

This sort of thing is safer done with split. Split it up on whitespace and grab the fields you want.

my @fields = split(/\s+/, $line);

However, this looks like a fixed width format which is better handled with unpack. See perlpacktut for more on that.

Upvotes: 4

Steffen Ullrich
Steffen Ullrich

Reputation: 123260

$line =~ m/^.+  MMM_WR  0\s+(\w+)\s+(\w+)/;

This regex expects two spaces in front of MMM_WR

 00000000000 00000000 0 MMM_WR  0            000004            00000abc

This line provides only a single space before MMM_WR. Thus, your regex (which expects twp spaces) cannot match. If you fix the regex to only expect a single space it works.

Upvotes: 3

choroba
choroba

Reputation: 241758

There seems to be only one space before MMM_WR in your data, but the regex contains two.

Upvotes: 3

Related Questions