Reputation: 11

Faster execution time for bash script

Hi: I made a script to combine different possibilities with files, but my files has 1000 lines each one, and with awk and echo it takes soo long to generate the output file. Is there anyway to do the same faster?

Example:

fileA.txt is:
dog
cat
horse
fish

fileB.txt is:
good
bad
pretty
ugly

I need fileC to be like:
doggood
dogbad
dogpretty
dogugly
catgood
catbad
catpretty
catugly
etc

Here`s the code:

#!/bin/bash
numA=1
while [ $numA -le 1000 ]; do
numB=1
    while [ $numB -le 1000 ]; do
        string1=$(awk "NR==$numA" fileA.txt)    
        string2=$(awk "NR==$numB" fileB.txt)
        string3="$string1$string2"
        echo "$string3" >> fileC.txt
        numB=$(($numB+1))
    done
    numA=$(($numA+1))
done

it will took weeks. I am new to bash scripting, so if someone has any idea, with a code example will be fine. Thanks

Upvotes: 0

Answers (3)

Socowi

Reputation: 27215

Just for fun

A hacky way to build the Cartesian product of two files A and B:

xargs -a A -n1 -d\\n xargs -a B -n1 -d\\n printf %s%s\\n

This will be slow too, because xargs starts a new printf process for each line of output. You could drastically speed this up using ...

xargs -a A -d\\n -I{} xargs -a B -d\\n printf {}%s\\n

... but that would make the command unsafe, because printf would interpret % and \ inside lines of file A. To fix this, you can use

sed 's/[%\\]/\\&/g' A | xargs -d\\n -I{} xargs -a B -d\\n printf {}%s\\n

Upvotes: 0

user14473238

Reputation:

If one of the files can fit in memory:

awk 'NR==FNR {a[++n]=$0; next} {for (i=1; i<=n; ++i) print $0 a[i]}' fileA fileB

With that example input,

#!/bin/sh -

awk '
  NR==FNR {
      a[++n]=$0
      next
  }

  {
      for (i=1; i<=n; ++i) {
          print $0 a[i]
      }
  }
' fileB.txt fileA.txt > fileC.txt

Upvotes: 2

Barmar

Reputation: 780929

Don't use awk to get the current line of the file; it has to read the entire file each time. Just read the files in loop.

while read -r string1; do
    while read -r string2; do
        echo "$string1$string2"
    done < fileB.txt
done < fileA.txt > fileC.txt

Upvotes: 1

Faster execution time for bash script

Answers (3)

Just for fun

Related Questions