Reputation: 113
I'm trying to parse lines between a date range in a file. However dates are formatted in a non standard way. Is it possible for a regex to match these? The log file is formatted like so:
Jan 5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb 1 01:14:00 log messages here
Feb 10 16:32:00 more messages
Mar 7 16:32:00 more messages
Apr 21 16:32:00 more messages
For example if I want to match lines between January 1st and Feb 10th, Ive been unable to get regex to match the month order since they arent numerical.
Upvotes: 0
Views: 235
Reputation: 26471
The following shell line, might do the trick. Assume you want to see the first 41 days after January '2nd', then you can do
pipeline of echo
, date
and grep
:
echo {0..41} \
| xargs -I{} -d ' ' date -d "2018-01-02 + {} days" +"%b %e" \
| grep -F -f - <logfile>
I believe this is the quickest. The idea is to build a set of possible days (these are the first two lines), and then search for them with grep
.
sorted log-file with awk
:
When processing sorted log-files you can use quick-returns to limit yourself to processing the only-needed fractions.
awk -v tstart="Jan 1" -v tend="Feb 10" '
BEGIN{ month["Jan"]=1; month["Feb"]=2; month["Mar"]=3
month["Arp"]=4; month["May"]=5; month["Jun"]=6
month["Jul"]=7; month["Aug"]=8; month["Sep"]=9
month["Oct"]=10;month["Nov"]=11;month["Dec"]=12
$0=tstart; ms=$1; ds=$2
$0=tend ; me=$1; de=$2
}
(month[$1]<month[ms]) { next }
(month[$1]==month[ms]) && ($2<ds) { next }
(month[$1]==month[me]) && ($2>de) { exit }
(month[$1]>month[me]) { exit }
1' <logfile>
unsorted log-file with awk
:
When processing unsorted log-files, you need to do the comparisons actively. This obviously takes much more time.
awk -v tstart="Jan 1" -v tend="Feb 10" '
BEGIN{ month["Jan"]=1; month["Feb"]=2; month["Mar"]=3
month["Arp"]=4; month["May"]=5; month["Jun"]=6
month["Jul"]=7; month["Aug"]=8; month["Sep"]=9
month["Oct"]=10;month["Nov"]=11;month["Dec"]=12
$0=tstart; ms=$1; ds=$2
$0=tend ; me=$1; de=$2
}
(ms == me) && ($1 == ms) && (ds<=$2) && ($2<=de) { print; next }
($1 == ms) && (ds<=$2) { print; next }
($1 == me) && ($2<=de) { print; next }
(month[ms]<month[$1]) && (month[$1]<month[me])` <logfile>
The above commands both return :
Jan 5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb 1 01:14:00 log messages here
Feb 10 16:32:00 more messages
note: date-ranges that cross the 31st of December might give bogus results.
Upvotes: 1