Reputation: 53
I have existing log files that have, among others, following type of lines:
2018-05-14T10:10:22.769029+03:00 timom usbmonitor: [INFORMATION 6] [FILE: UsbChecker.cpp:51][FUNC: vendorCheck][MSG: USB vendors changed: "0403 14e1 05e3 05e3 03f0 0403 0bda 1d6b 1d6b 1d6b 1d6b 1d6b 1d6b 1d6b" ]
From these files I want to grep lines above so that I get the timestamp from the beginning and the text inside quotes so that I'd have a nice and compact output:
2018-05-14T10:10:22.769029+03:00 0403 14e1 05e3 05e3 03f0 0403 0bda 1d6b 1d6b 1d6b 1d6b 1d6b 1d6b 1d6b
Is there a way to do this with a one-liner?
I'm looking for a way to efficiently get the desired output without the need to loop over grepped lines. I have thousands of log files each of which may have hundreds of matches so the grep/sed/whatever needs to be efficient.
So far I've done it like this:
#!/bin/bash
INPUTDIR=
OUTPUTDIR=
while getopts ":h:d:o:" OPTION; do
case $OPTION in
h)
usage
exit 1
;;
d)
INPUTDIR=$OPTARG
;;
o)
OUTPUTDIR=$OPTARG
;;
?)
usage
exit 1
;;
esac
done
if [ -z $INPUTDIR ] || [ -z $OUTPUTDIR ]; then
echo "BAD ARGUMENTS: both directories aren't given" >&2
usage
exit 1
fi
OUTPUTFILE="$(date +%Y%m%d%H%M%S)-usb-analysis-summary"
for i in $( ls $INPUTDIR ); do
# Interesting files are of format <number>_<number>
if [ $(echo "$i" | grep -Ev "^[0-9]+_[0-9]+$") ] ; then
echo "Skipping $i"
continue
fi
grep vendorCheck $INPUTDIR/$i | while read -r l ; do
# We do know timestamp is 32 characters long. GEFN
echo "$l" | sed -r "s|^(.{32}).*changed: \"(.*)\".*|\1 \2|" >>$OUTPUTFILE
done
done
But this is not optimal as now I'm looping the files and then looping grep matches from each file.
I tried
grep "vendorCheck" $INPUTDIR/$i | sed -r "s|^(.{32}).*changed: \"(.*)\".*|\1 \2|"
But this removes line breaks.
Then if I put multiple patterns in one grep I'm also in trouble with formatting; I need to get the timestamp and text inside quotes to one line, and next similar match to next line.
Upvotes: 0
Views: 150
Reputation: 15246
Sed can do the line selection matching and editing all at a go.
You could also use $(...)
to generate sed's input file list, so you really can get it all into one line, I think, but that ls
isn't ideal, and you said you needed filenames in a comment below, so...
Rather than
sed -r -n '/vendorCheck/{s/(.{32}).*changed: \"(.*)\"/\1 \2/; p;}' $( ls -1 $INPUTDIR | egrep '^[0-9]+_[0-9]+$' ) >> $OUTPUTFILE
You can embed some whitespace to make it a little less ugly without changing the "one-liner" functionality, and a loop can replace the ls
:
for f in $INPUTDIR/[0-9]*_[0-9]* # limit input, not a definitive check
do echo "$f" | egrep '^[0-9]+_[0-9]+$' || continue # CONFIRM filename match
[[ -f $f ]] || continue # and assert file, not dir
sed -r -n "/vendorCheck/{
s/(.{32}).*changed: \"(.*)\"/\1 \2/;
s/^/$f: /;
p;
}" "$f" # the "s/^/$f: /;" is a placeholder of your need for the name
done >> $OUTPUTFILE
NOTE: deleted my test data, so this rework didn't get vetted as carefully. Let me know if anyone sees a typo.
Upvotes: 1