user2014600
user2014600

Reputation: 13

Search for text between two time frame using sed

I have log files with time stamps. I want to search for text between two time stamps using sed even if the first time stamp or the last time stamp are not present. For example, if I search between 9:30 and 9:40 then it should return text even if neither 9:30 nor 9:40 is there but the time stamp is between 9:30 and 9:40.

I am using a sed one liner:

sed -n '/7:30:/,/7:35:/p' xyz.log  

But it only returns data if both the time stamps are present; it will print everything if one of the time stamp are missing. And if the time is in 12 hr format it will pull data for both AM and PM.

Additionally, I have different time stamp formats for different log files so I need a generic command.

Here are some time format examples:

<Jan 27, 2013 12:57:16 AM MST>

Jan 29, 2013 8:58:12 AM 

2013-01-31 06:44:04,883

Some of them contain AM/PM i.e. 12 hr format and others contain 24 hr format so I have to account for that as well.

I have tried this as well but it doesn't work:

sed -n -e '/^2012-07-19 18:22:48/,/2012-07-23 22:39:52/p' history.log

Upvotes: 0

Views: 3258

Answers (1)

Jonathan Leffler
Jonathan Leffler

Reputation: 754670

With the serious medley of time formats you have to parse, sed is not the correct tool to use. I'd automatically reach for Perl, but Python would do too, and you probably could do it in awk if you put your mind to it. You need to normalize the time formats (you don't say anything about date, so I assume you're working only with the time portion).


#!/usr/bin/env perl
use strict;
use warnings;
use constant debug => 0;

my $lo = "09:30";
my $hi = "09:40";

my $lo_tm = to_minutes($lo);
my $hi_tm = to_minutes($hi);

while (<>)
{
    print "Read: $_" if debug;
    if (m/\D\d\d?:\d\d:\d\d/)
    {
        my $tm = normalize_hhmm($_);
        print "Normalized: $tm\n" if debug;
        print $_ if ($tm >= $lo_tm && $tm<= $hi_tm);
    }
}

sub to_minutes
{
    my($val) = @_;
    my($hh, $mm) = split /:/, $val;
    if ($hh < 0 || $hh > 24 || $mm < 0 || $mm >= 60 || ($hh == 24 && $mm != 0))
    {
        print STDERR "to_minutes(): garbage = $val\n";
        return undef;
    }
    return $hh * 60 + $mm;
}

sub normalize_hhmm
{
    my($line) = @_;
    my($hhmm, $ampm) = $line =~ m/\D(\d\d?:\d\d):\d\d\s*(AM|PM|am|pm)?/;
    my $tm = to_minutes($hhmm);
    if (defined $ampm)
    {
        if ($ampm =~ /(am|AM)/)
        {
            $tm -= 12 * 60 if ($tm >= 12 * 60);
        }
        else
        {
            $tm += 12 * 60 if ($tm < 12 * 60);
        }
    }
    return $tm;
}

I used the sample data:

<Jan 27, 2013 12:57:16 AM MST>

Jan 29, 2013 8:58:12 AM 

2013-01-31 06:44:04,883

Feb 2 00:00:00 AM
Feb 2 00:59:00 AM
Feb 2 01:00:00 AM
Feb 2 01:00:00 PM
Feb 2 11:00:00 AM
Feb 2 11:00:00 PM
Feb 2 11:59:00 AM
Feb 2 11:59:00 PM
Feb 2 12:00:00 AM
Feb 2 12:00:00 PM
Feb 2 12:59:00 AM
Feb 2 12:59:00 PM

Feb 2 00:00:00
Feb 2 00:59:00
Feb 2 01:00:00
Feb 2 11:59:59
Feb 2 12:00:00
Feb 2 12:59:59
Feb 2 13:00:00
Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 2 09:41:00
Feb 2 23:00:00 
Feb 2 23:59:00
Feb 2 24:00:00
Feb 3 09:30:00
Feb 3 09:40:00

and it produced what I consider the correct output:

Feb 2 09:31:00
Feb 2 09:35:23
Feb 2 09:36:23
Feb 2 09:37:23
Feb 2 09:35:00
Feb 2 09:40:00
Feb 2 09:40:59
Feb 3 09:30:00
Feb 3 09:40:00

I'm sure this isn't the only way to do the processing; it seems to work, though.


If you need to do date analysis, then you need to use one of the date or time manipulation packages from CPAN to deal with the problems. The code above also hard codes the times in the script. You'd probably want to handle them as command line arguments, which is perfectly doable, but isn't scripted above.

Upvotes: 1

Related Questions