Reputation: 13
I have log files with time stamps. I want to search for text between two time stamps using sed
even if the first time stamp or the last time stamp are not present.
For example, if I search between 9:30 and 9:40 then it should return text even if neither 9:30 nor 9:40 is there but the time stamp is between 9:30 and 9:40.
I am using a sed
one liner:
sed -n '/7:30:/,/7:35:/p' xyz.log
But it only returns data if both the time stamps are present; it will print everything if one of the time stamp are missing. And if the time is in 12 hr format it will pull data for both AM and PM.
Additionally, I have different time stamp formats for different log files so I need a generic command.
Here are some time format examples:
<Jan 27, 2013 12:57:16 AM MST> Jan 29, 2013 8:58:12 AM 2013-01-31 06:44:04,883
Some of them contain AM/PM i.e. 12 hr format and others contain 24 hr format so I have to account for that as well.
I have tried this as well but it doesn't work:
sed -n -e '/^2012-07-19 18:22:48/,/2012-07-23 22:39:52/p' history.log
Upvotes: 0
Views: 3258
Reputation: 754670
With the serious medley of time formats you have to parse, sed
is not the correct tool to use. I'd automatically reach for Perl, but Python would do too, and you probably could do it in awk
if you put your mind to it. You need to normalize the time formats (you don't say anything about date, so I assume you're working only with the time portion).
#!/usr/bin/env perl
use strict;
use warnings;
use constant debug => 0;
my $lo = "09:30";
my $hi = "09:40";
my $lo_tm = to_minutes($lo);
my $hi_tm = to_minutes($hi);
while (<>)
{
print "Read: $_" if debug;
if (m/\D\d\d?:\d\d:\d\d/)
{
my $tm = normalize_hhmm($_);
print "Normalized: $tm\n" if debug;
print $_ if ($tm >= $lo_tm && $tm<= $hi_tm);
}
}
sub to_minutes
{
my($val) = @_;
my($hh, $mm) = split /:/, $val;
if ($hh < 0 || $hh > 24 || $mm < 0 || $mm >= 60 || ($hh == 24 && $mm != 0))
{
print STDERR "to_minutes(): garbage = $val\n";
return undef;
}
return $hh * 60 + $mm;
}
sub normalize_hhmm
{
my($line) = @_;
my($hhmm, $ampm) = $line =~ m/\D(\d\d?:\d\d):\d\d\s*(AM|PM|am|pm)?/;
my $tm = to_minutes($hhmm);
if (defined $ampm)
{
if ($ampm =~ /(am|AM)/)
{
$tm -= 12 * 60 if ($tm >= 12 * 60);
}
else
{
$tm += 12 * 60 if ($tm < 12 * 60);
}
}
return $tm;
}
I used the sample data:
<Jan 27, 2013 12:57:16 AM MST>
Jan 29, 2013 8:58:12 AM
2013-01-31 06:44:04,883
Feb 2 00:00:00 AM
Feb 2 00:59:00 AM
Feb 2 01:00:00 AM
Feb 2 01:00:00 PM
Feb 2 11:00:00 AM
Feb 2 11:00:00 PM
Feb 2 11:59:00 AM
Feb 2 11:59:00 PM
Feb 2 12:00:00 AM
Feb 2 12:00:00 PM
Feb 2 12:59:00 AM
Feb 2 12:59:00 PM
Feb 2 00:00:00
Feb 2 00:59:00
Feb 2 01:00:00
Feb 2 11:59:59
Feb 2 12:00:00
Feb 2 12:59:59
Feb 2 13:00:00
Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 2 09:41:00
Feb 2 23:00:00
Feb 2 23:59:00
Feb 2 24:00:00
Feb 3 09:30:00
Feb 3 09:40:00
and it produced what I consider the correct output:
Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 3 09:30:00
Feb 3 09:40:00
I'm sure this isn't the only way to do the processing; it seems to work, though.
If you need to do date analysis, then you need to use one of the date or time manipulation packages from CPAN to deal with the problems. The code above also hard codes the times in the script. You'd probably want to handle them as command line arguments, which is perfectly doable, but isn't scripted above.
Upvotes: 1