Reputation: 191
I have log file in following format:
20:15:35 start opsdfslkdfflkjsdlkfjlsdkfj
20:17:21 lkjlkj lklkjlkjlkjlkjlkjlkjlkjlkj
.
.
.
20:34:11 end kljsdklasjdlaksjdasdasd
20:36:20 start lksadjlaskjdalksdj
.
.
etc
In result of parsing this file I would like to get time differences between subsequent start
and end
entries. For the sake of consistency it should be done in bash ( other log parsing was done in bash + plotting with gnuplot ). But reading file by redirecting it to while
loop and then using e.g. awk
to convert timestamp to seconds makes whole parsing extremely slow ( probably due to creating new subprocess per each line ).
while read line; do
if [[ $string == *"start"* ]]
then
start=$(echo $line | awk '{print $1}' | awk -F: '{ print ($1 * 3600) + ($2 * 60) + $3 }')
echo $start
fi
done <log.txt
Any ideas how this can be done efficiently in bash?
Upvotes: 1
Views: 723
Reputation: 295315
It's slower than a pure awk
instance would be, but in native bash, using only shell builtins:
while IFS=': ' read -r hr min sec content; do
if [[ $content = *"start"* ]]; then
start=$(( hr * 3600 + min * 60 + sec ))
echo "$start"
fi
done <log.txt
This will also run -- much faster -- in proper David Korn ksh. (Results, particularly performance results, will vary if using a ksh clone such as mksh rather than the legitimate article).
Alternately, for pure awk
, you can avoid having any while read
loop in bash at all:
awk -F: '/start/ { print ($1 * 3600) + ($2 * 60) + $3 }' <log.txt
To implement the whole thing in bash (recognizing start/end pairs and printing the deltas) might look like this:
while IFS=': ' read -r hr min sec sigil rest; do
case $sigil in
start) start_sec=$(( hr * 3600 + min * 60 + sec )); end_sec= ;;
end) end_sec=$(( hr * 3600 + min * 60 + sec ))
if [[ $start_sec ]]; then
echo "$start_sec->$end_sec -- $(( end_sec - start_sec )) elapsed"
start_sec=
fi
;;
esac
done <log.txt
...or, for the whole thing in awk:
awk -F: '
/start/ { start=( ($1 * 3600) + ($2 * 60) + $3 ) }
/end/ { end=( ($1 * 3600) + ($2 * 60) + $3 );
if (start) {
print start " -> " end " -- " (end - start) " elapsed"
start=0
}
}
' <log.txt
Upvotes: 6
Reputation: 2868
This is a funny version, using a sed
script and some compound-list of commands without a while
loop; its execution time should be comparable to awk
, still slower I believe: it needs to be confirmed with some testing.
Give a try to this to print time for the line with start
:
sed -n '/^[0-9:]* start /{s/^\([0-9]*\):\([0-9]*\):\([0-9]*\) .*$/\1 60*\2+60*\3+p/p}' log.txt | dc
dc
is a reverse-polish desk calculator.
sed
is used to select the lines with start
and make the string that is used with dc
.
If your file always contains a pair of start
and end
, in this order, try this to calculate the time difference:
printf "%s %sr-p" $(sed -n '/^[0-9:]* \(start\|end\) /{s/^\([0-9]*\):\([0-9]*\):\([0-9]*\) .*$/\1 60*\2+60*\3+p/p}' log.txt | dc) | dc
printf
is used to print start and end times by pair and generates another string to make a second dc
calculate the difference.
Upvotes: 0