bash efficient file parsing

Question

I have log file in following format:

20:15:35 start opsdfslkdfflkjsdlkfjlsdkfj
20:17:21 lkjlkj lklkjlkjlkjlkjlkjlkjlkjlkj
.
.
.
20:34:11 end kljsdklasjdlaksjdasdasd
20:36:20 start lksadjlaskjdalksdj
.
.
etc

In result of parsing this file I would like to get time differences between subsequent start and end entries. For the sake of consistency it should be done in bash ( other log parsing was done in bash + plotting with gnuplot ). But reading file by redirecting it to while loop and then using e.g. awk to convert timestamp to seconds makes whole parsing extremely slow ( probably due to creating new subprocess per each line ).

while read line; do
    if [[ $string == *"start"* ]]
    then
        start=$(echo $line | awk '{print $1}' | awk -F: '{ print ($1 * 3600) + ($2 * 60) + $3 }')
        echo $start
    fi
done



Any ideas how this can be done efficiently in bash?

Charles Duffy · Accepted Answer

It's slower than a pure awk instance would be, but in native bash, using only shell builtins:

while IFS=': ' read -r hr min sec content; do
  if [[ $content = *"start"* ]]; then
    start=$(( hr * 3600 + min * 60 + sec ))
    echo "$start"
  fi
done



This will also run -- much faster -- in proper David Korn ksh. (Results, particularly performance results, will vary if using a ksh clone such as mksh rather than the legitimate article).



Alternately, for pure awk, you can avoid having any while read loop in bash at all:

awk -F: '/start/ { print ($1 * 3600) + ($2 * 60) + $3 }' 




To implement the whole thing in bash (recognizing start/end pairs and printing the deltas) might look like this:

while IFS=': ' read -r hr min sec sigil rest; do
  case $sigil in
    start) start_sec=$(( hr * 3600 + min * 60 + sec )); end_sec= ;;
    end)   end_sec=$(( hr * 3600 + min * 60 + sec ))
           if [[ $start_sec ]]; then
             echo "$start_sec->$end_sec -- $(( end_sec - start_sec )) elapsed"
             start_sec=
           fi
           ;;
  esac
done 


...or, for the whole thing in awk:

awk -F: '
  /start/ { start=( ($1 * 3600) + ($2 * 60) + $3 ) }
  /end/   { end=(   ($1 * 3600) + ($2 * 60) + $3 );
            if (start) {
              print start " -> " end " -- " (end - start) " elapsed"
              start=0
            }
          }
'

bash efficient file parsing

Answers (2)

Related Questions