Sam
Sam

Reputation: 113

Regex to match logfile custom date formats

I'm trying to parse lines between a date range in a file. However dates are formatted in a non standard way. Is it possible for a regex to match these? The log file is formatted like so:

Jan  5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb  1 01:14:00 log messages here
Feb 10 16:32:00 more messages
Mar  7 16:32:00 more messages
Apr 21 16:32:00 more messages

For example if I want to match lines between January 1st and Feb 10th, Ive been unable to get regex to match the month order since they arent numerical.

Upvotes: 0

Views: 235

Answers (1)

kvantour
kvantour

Reputation: 26471

The following shell line, might do the trick. Assume you want to see the first 41 days after January '2nd', then you can do

pipeline of echo, date and grep:

echo {0..41} \
  | xargs -I{} -d ' ' date -d "2018-01-02 + {} days" +"%b %e" \
  | grep -F -f - <logfile>

I believe this is the quickest. The idea is to build a set of possible days (these are the first two lines), and then search for them with grep.

sorted log-file with awk:

When processing sorted log-files you can use quick-returns to limit yourself to processing the only-needed fractions.

awk -v tstart="Jan  1" -v tend="Feb 10" '
   BEGIN{ month["Jan"]=1; month["Feb"]=2; month["Mar"]=3
          month["Arp"]=4; month["May"]=5; month["Jun"]=6
          month["Jul"]=7; month["Aug"]=8; month["Sep"]=9
          month["Oct"]=10;month["Nov"]=11;month["Dec"]=12
          $0=tstart; ms=$1; ds=$2
          $0=tend  ; me=$1; de=$2
         }
  (month[$1]<month[ms])             { next }
  (month[$1]==month[ms]) && ($2<ds) { next }
  (month[$1]==month[me]) && ($2>de) { exit }
  (month[$1]>month[me])             { exit }
  1' <logfile>

unsorted log-file with awk :

When processing unsorted log-files, you need to do the comparisons actively. This obviously takes much more time.

awk -v tstart="Jan  1" -v tend="Feb 10" '
   BEGIN{ month["Jan"]=1; month["Feb"]=2; month["Mar"]=3
          month["Arp"]=4; month["May"]=5; month["Jun"]=6
          month["Jul"]=7; month["Aug"]=8; month["Sep"]=9
          month["Oct"]=10;month["Nov"]=11;month["Dec"]=12
          $0=tstart; ms=$1; ds=$2
          $0=tend  ; me=$1; de=$2
         }
   (ms == me) && ($1 == ms) && (ds<=$2) && ($2<=de) { print; next }
   ($1 == ms) && (ds<=$2)                           { print; next }
   ($1 == me) && ($2<=de)                           { print; next }
   (month[ms]<month[$1]) && (month[$1]<month[me])` <logfile>

The above commands both return :

Jan  5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb  1 01:14:00 log messages here
Feb 10 16:32:00 more messages

note: date-ranges that cross the 31st of December might give bogus results.

Upvotes: 1

Related Questions