shigutso
shigutso

Reputation: 33

ShellScript: grep+while+cut+awk in a large file = very slow

I have this script running in a 1.7GB text file.

#!/bin/bash

File1=$1.tmp
File2=$1.modified

grep '^#' $1 > $File2
grep -v '#' $1 > $File1

while read line; do
        column_four=$(echo $line | cut -d " " -f4)
        final_line=$(echo $line | cut -d " " -f4-5)
        if [ "$column_four" == "0" ]; then
               beginning_line=$(echo $line | cut -d " " -f1-3)
               final_line=$(echo $line | cut -d " " -f4-5)
        else
               final_line=$(echo $line | cut -d " " -f1-2)
        fi
        linef=$(echo "$beginning_line $final_line")
        echo $linef | awk '{printf "%5.0f%12.4f%12.4f%5.0f%12.4f\n", $1, $2, $3, $4, $5}' >> $File2
done < $File1
rm -f $File1

The problem: it's very, very slow. It creates a new file with columns arranged in a speed of 200KB per minute with a Core2Duo. How can I make it faster?

Thank you.

Upvotes: 3

Views: 1448

Answers (2)

Zsolt Botykai
Zsolt Botykai

Reputation: 51643

You can the whole thing in awk, as far as I see, something like

awk '/^#/ { print $0 >> "File2" ; getline}
     $0 ! ~ /#/ { if ( $4 == 0 ) { 
                  f1 = $1 ; f2 = $2 ; f3 = $3
                  printf("%5.0f%12.4f%12.4f%5.0f%12.4f\n", f1, f2, f3, $4, $5) >> "File2" }
                  else { printf("%5.0f%12.4f%12.4f%5.0f%12.4f\n", f1, f2, f3, $1, $2) >> "File2" }
                      } INPUTFILE

Upvotes: 3

beny23
beny23

Reputation: 35038

I would do away with the loop and use a single invocation of awk:

awk '
{
    if ($4 == 0) {
       f1 = $1;
       f2 = $2;
       f3 = $3;
       f4 = $4;
       f5 = $5;
    } else {
       f4 = $1;
       f5 = $2;
    }
    printf ("%5.0f%12.4f%12.4f%5.0f%12.4f\n", f1, f2, f3, f4, f5);
}' < $File1 >> $File2

That way you're not invoking awk, echo and cut multiple times per line of your input file and are just running a single awk process.

Upvotes: 3

Related Questions