ShellScript: grep+while+cut+awk in a large file = very slow

Question

I have this script running in a 1.7GB text file.

#!/bin/bash

File1=$1.tmp
File2=$1.modified

grep '^#' $1 > $File2
grep -v '#' $1 > $File1

while read line; do
        column_four=$(echo $line | cut -d " " -f4)
        final_line=$(echo $line | cut -d " " -f4-5)
        if [ "$column_four" == "0" ]; then
               beginning_line=$(echo $line | cut -d " " -f1-3)
               final_line=$(echo $line | cut -d " " -f4-5)
        else
               final_line=$(echo $line | cut -d " " -f1-2)
        fi
        linef=$(echo "$beginning_line $final_line")
        echo $linef | awk '{printf "%5.0f%12.4f%12.4f%5.0f%12.4f
", $1, $2, $3, $4, $5}' >> $File2
done < $File1
rm -f $File1

The problem: it's very, very slow. It creates a new file with columns arranged in a speed of 200KB per minute with a Core2Duo. How can I make it faster?

Thank you.

beny23 · Accepted Answer

I would do away with the loop and use a single invocation of awk:

awk '
{
    if ($4 == 0) {
       f1 = $1;
       f2 = $2;
       f3 = $3;
       f4 = $4;
       f5 = $5;
    } else {
       f4 = $1;
       f5 = $2;
    }
    printf ("%5.0f%12.4f%12.4f%5.0f%12.4f
", f1, f2, f3, f4, f5);
}' < $File1 >> $File2

That way you're not invoking awk, echo and cut multiple times per line of your input file and are just running a single awk process.

ShellScript: grep+while+cut+awk in a large file = very slow

Answers (2)

Related Questions