Shaw
Shaw

Reputation: 1139

perl - How to extract lines from a file based on their position

I am processing a text file to extract lines that contain a timestamp and then performing a calculation on those timestamps. The line contains a timestamp followed by a message which I'm performing a regular expression on to extract.

TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

Below is sudo code of the regular expression I'm carrying out on the file

... .... ... 


open (my $FH, "<", $file) or die "Cannot open <$file>: $!";
for my $line (<$FH>) {
    if ($line =~ /bar/) {
        my $ts1 = ExtractTimestamp($line);
    } elsif ($line =~ /FOO/) {
        my $ts2 = ExtractTimestamp($line);
    }
}
my $diff = $ts2 - $ts1;

The problem here is that the regular expression finds the first occurrence of the line and extracts that, which leaves me with negative timestamps. I'm wondering are there any modules in perl or any technique where I can extract occrurences of lets say FOO that occur in the file after BAR?

Would appreciate any help here!

Upvotes: 0

Views: 144

Answers (3)

Borodin
Borodin

Reputation: 126722

This solution uses the range operator to find the first BAR line followed by the first FOO line after it. The time in the record is pushed onto array @ts if it is either the first or the last line in the range

use strict;
use warnings;

my @ts;
while ( <DATA> ) {
    next unless my $state = /BAR/ .. /FOO/;
    push @ts, /([\d:.]+)/ if $state == 1 or $state =~ /E/;
}

print join(' ... ', @ts), "\n";

__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

output

20:48:47.353 ... 20:48:52.192

Upvotes: 5

AnFi
AnFi

Reputation: 10903

open (my $FH, "<", $file) or die "Cannot open <$file>: $!";
# define $ts1 and $ts2 OUTSIDE "for" loop
my( $ts1, $ts2);
for my $line (<$FH>) {
    if ($line =~ /bar/) {
        $ts1 = ExtractTimestamp($line);
    } 
    # ignore FOO before first BAR sets $ts1
    elsif ( defined($ts1) and $line =~ /FOO/) { 
        $ts2 = ExtractTimestamp($line);
        # stop searching after first FOO and "BAR after FOO" pair
        last;
    }
}
# if both FOO and "BAR after FOO" has set their variables
if( defined($ts1) and defined($ts2)) {
   my $diff = $ts2 - $ts1;
   ...
 }

Upvotes: 2

Sobrique
Sobrique

Reputation: 53478

There's several ways to do this in perl, depending on precisely what you want to accomplish. If I'm reading you right, you're looking at finding both the FOO and BAR timestamps, and presumably trying to extract a delta?

Key questions would be - are both FOO and BAR exactly matched?

I mean, you could do it via multi-line regex:

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

local $/;

my ( $bar, $foo )  =  <DATA> =~ m/^(\d\S+) \| BAR.*?(\d\S+) \| FOO$/ms;
print "BAR: $bar\nFOO: $foo\n";

__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

This will match the first instance of paired 'BAR' and 'FOO'. (You can capture multiple times if you use the g flag on you regex).

Alternatively - you can set the record separator to FOO:

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

local $/ = "FOO\n"; 

while ( <DATA> ) {

   my ( $foo ) = m/(\S+) \| FOO/;
   my ( $bar ) = m/(\S+) \| BAR/;
   print "$foo $bar\n";

}

__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

Or what you're doing - iterating line by line:

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

my $last_bar;
while (<DATA>) {

    if (m/^(\d\S+) \| BAR/) {
        $last_bar = $1;
    }
    if ( my ($foo) = m/^(\d\S+) \| FOO/ ) {
        if ($last_bar) {
            print "$foo $last_bar\n";
        }
        else {
            print "Unmatched:\n";
            print;
        }
        $last_bar = undef;
    }
}

__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO

Upvotes: 0

Related Questions