Robert
Robert

Reputation: 169

Bash Script is super slow

I'm updating an old script to parse ARP data and get useful information out of it. We added a new router and while I can pull the ARP data out of the router it's in a new format. I've got a file "zTempMonth" which is a all the arp data from both sets of routers that I need to compile down into a new data format that's normalized. The below lines of code do what I need them to logically - but it's extremely slow - as in it will take days to run these loops where previously the script took 20-30 minutes. Is there a way to speed this up, or identify what's slowing it down?

Thank you in advance,

    echo "Parsing zTempMonth"
    while read LINE
    do
            wc=`echo $LINE | wc -w`
            if [[ $wc -eq "6" ]]; then
                    true
                    out=$(echo $LINE | awk '{ print $2 " " $4 " " $6}')
                    echo $out >> zTempMonth.tmp

            else
                    false
            fi

            if [[ $wc -eq "4" ]]; then
                    true
                    out=$(echo $LINE | awk '{ print $1 " " $3 " " $4}')
                    echo $out >> zTempMonth.tmp
            else
                    false
            fi


    done < zTempMonth

Upvotes: 4

Views: 11479

Answers (2)

Eduardo Cuomo
Eduardo Cuomo

Reputation: 19016

When writing shell scripts, it’s almost always better to call a function directly rather than using a subshell to call the function. The usual convention that I’ve seen is to echo the return value of the function and capture that output using a subshell.

For example:

#!/bin/bash
function get_path() {
  echo "/path/to/something"
}

mypath="$(get_path)"

This works fine, but there is a significant speed overhead to using a subshell and there is a much faster alternative. Instead, you can just have a convention wherein a particular variable is always the return value of the function (I use retval). This has the added benefit of also allowing you to return arrays from your functions.

If you don’t know what a subshell is, for the purposes of this blog post, a subshell is another bash shell that is spawned whenever you use $() or `` and is used to execute the code you put inside.

I did some simple testing to allow you to observe the overhead. For two functionally equivalent scripts:

This one uses a subshell:

#!/bin/bash

function foo() {
  # Return value
  echo hello
}

for (( i = 0; i < 10000; i++ )); do
  result="$(foo)"
  echo $result
done

This one uses a variable:

#!/bin/bash

# Initialize
retval=""

function foo() {
  # Return value
  retval="hello"
}

for (( i = 0; i < 10000; i++ )); do
  foo
  echo $retval
done

The speed difference between these two is noticeable and significant.

$ for i in variable subshell; do
>   echo -e "\n$i"
>   time ./$i > /dev/null
> done

variable

real 0m0.367s
user 0m0.346s
sys 0m0.015s

subshell

real 0m11.937s
user 0m3.121s
sys 0m0.359s

(variable and subshell are executable scripts)

As you can see, when using variable, execution takes 0.367 seconds. subshell however takes a full 11.937 seconds!

Source: http://rus.har.mn/blog/2010-07-05/subshells/

Problem alternative

Finally, you can rewrite your script like following:

echo "Parsing zTempMonth"

while read LINE ; do
  # Save the output at a temporal file
  echo $LINE | wc -w > zTempMonth-x.tmp

  # Read file line by line
  wc=''
  while read line; do
    wc="$wc
$line"
  done < zTempMonth-x.tmp

  if [[ $wc -eq "6" ]]; then
    true
    echo $LINE | awk '{ print $2 " " $4 " " $6}' >> zTempMonth.tmp
  else
    false
  fi

  if [[ $wc -eq "4" ]]; then
    true
    echo $LINE | awk '{ print $1 " " $3 " " $4}' >> zTempMonth.tmp
  else
    false
  fi
done < zTempMonth

Upvotes: 8

kojiro
kojiro

Reputation: 77197

  1. While read loops are slow.
  2. Subshells in a loop are slow.
  3. >> (open(f, 'a')) calls in a loop are slow.

You could speed this up and remain in pure bash, just by losing #2 and #3:

#!/usr/bin/env bash

while read -a line; do
    case "${#line[@]}" in
        6) printf '%s %s %s\n' "${line[1]}" "${line[3]}" "${line[5]}";;
        4) printf '%s %s %s\n' "${line[0]}" "${line[2]}" "${line[3]}";;
    esac
done < zTempMonth >> zTempMonth.tmp

But if there are more than a few lines, this will still be slower than pure awk. Consider an awk script as simple as this:

BEGIN {
    print "Parsing zTempMonth"
}   

NF == 6 {
    print $2 " " $4 " " $6
}   

NF == 4 {
    print $1 " " $3 " " $4
}   

You could execute it like this:

awk -f thatAwkScript zTempMonth >> zTempMonth.tmp

to get the same append approach as your current script.

Upvotes: 11

Related Questions