Reputation: 1139
I am processing a text file to extract lines that contain a timestamp and then performing a calculation on those timestamps. The line contains a timestamp followed by a message which I'm performing a regular expression on to extract.
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO
Below is sudo code of the regular expression I'm carrying out on the file
... .... ...
open (my $FH, "<", $file) or die "Cannot open <$file>: $!";
for my $line (<$FH>) {
if ($line =~ /bar/) {
my $ts1 = ExtractTimestamp($line);
} elsif ($line =~ /FOO/) {
my $ts2 = ExtractTimestamp($line);
}
}
my $diff = $ts2 - $ts1;
The problem here is that the regular expression finds the first occurrence of the line and extracts that, which leaves me with negative timestamps. I'm wondering are there any modules in perl or any technique where I can extract occrurences of lets say FOO that occur in the file after BAR?
Would appreciate any help here!
Upvotes: 0
Views: 144
Reputation: 126722
This solution uses the range operator to find the first BAR
line followed by the first FOO
line after it. The time in the record is pushed onto array @ts
if it is either the first or the last line in the range
use strict;
use warnings;
my @ts;
while ( <DATA> ) {
next unless my $state = /BAR/ .. /FOO/;
push @ts, /([\d:.]+)/ if $state == 1 or $state =~ /E/;
}
print join(' ... ', @ts), "\n";
__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO
20:48:47.353 ... 20:48:52.192
Upvotes: 5
Reputation: 10903
open (my $FH, "<", $file) or die "Cannot open <$file>: $!";
# define $ts1 and $ts2 OUTSIDE "for" loop
my( $ts1, $ts2);
for my $line (<$FH>) {
if ($line =~ /bar/) {
$ts1 = ExtractTimestamp($line);
}
# ignore FOO before first BAR sets $ts1
elsif ( defined($ts1) and $line =~ /FOO/) {
$ts2 = ExtractTimestamp($line);
# stop searching after first FOO and "BAR after FOO" pair
last;
}
}
# if both FOO and "BAR after FOO" has set their variables
if( defined($ts1) and defined($ts2)) {
my $diff = $ts2 - $ts1;
...
}
Upvotes: 2
Reputation: 53478
There's several ways to do this in perl, depending on precisely what you want to accomplish. If I'm reading you right, you're looking at finding both the FOO
and BAR
timestamps, and presumably trying to extract a delta?
Key questions would be - are both FOO
and BAR
exactly matched?
I mean, you could do it via multi-line regex:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
local $/;
my ( $bar, $foo ) = <DATA> =~ m/^(\d\S+) \| BAR.*?(\d\S+) \| FOO$/ms;
print "BAR: $bar\nFOO: $foo\n";
__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO
This will match the first instance of paired 'BAR' and 'FOO'. (You can capture multiple times if you use the g
flag on you regex).
Alternatively - you can set the record separator to FOO
:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
local $/ = "FOO\n";
while ( <DATA> ) {
my ( $foo ) = m/(\S+) \| FOO/;
my ( $bar ) = m/(\S+) \| BAR/;
print "$foo $bar\n";
}
__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO
Or what you're doing - iterating line by line:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $last_bar;
while (<DATA>) {
if (m/^(\d\S+) \| BAR/) {
$last_bar = $1;
}
if ( my ($foo) = m/^(\d\S+) \| FOO/ ) {
if ($last_bar) {
print "$foo $last_bar\n";
}
else {
print "Unmatched:\n";
print;
}
$last_bar = undef;
}
}
__DATA__
TIME | MESSAGE
20:48:27.159 | FOO
20:48:47.353 | BAR
20:48:49.227 | SPAM
20:48:52.192 | FOO
Upvotes: 0