Reputation: 11
Hi: I made a script to combine different possibilities with files, but my files has 1000 lines each one, and with awk and echo it takes soo long to generate the output file. Is there anyway to do the same faster?
Example:
fileA.txt is:
dog
cat
horse
fish
fileB.txt is:
good
bad
pretty
ugly
I need fileC to be like:
doggood
dogbad
dogpretty
dogugly
catgood
catbad
catpretty
catugly
etc
Here`s the code:
#!/bin/bash
numA=1
while [ $numA -le 1000 ]; do
numB=1
while [ $numB -le 1000 ]; do
string1=$(awk "NR==$numA" fileA.txt)
string2=$(awk "NR==$numB" fileB.txt)
string3="$string1$string2"
echo "$string3" >> fileC.txt
numB=$(($numB+1))
done
numA=$(($numA+1))
done
it will took weeks. I am new to bash scripting, so if someone has any idea, with a code example will be fine. Thanks
Upvotes: 0
Views: 698
Reputation: 27215
A hacky way to build the Cartesian product of two files A and B:
xargs -a A -n1 -d\\n xargs -a B -n1 -d\\n printf %s%s\\n
This will be slow too, because xargs
starts a new printf
process for each line of output. You could drastically speed this up using ...
xargs -a A -d\\n -I{} xargs -a B -d\\n printf {}%s\\n
... but that would make the command unsafe, because printf
would interpret %
and \
inside lines of file A. To fix this, you can use
sed 's/[%\\]/\\&/g' A | xargs -d\\n -I{} xargs -a B -d\\n printf {}%s\\n
Upvotes: 0
Reputation:
If one of the files can fit in memory:
awk 'NR==FNR {a[++n]=$0; next} {for (i=1; i<=n; ++i) print $0 a[i]}' fileA fileB
With that example input,
#!/bin/sh -
awk '
NR==FNR {
a[++n]=$0
next
}
{
for (i=1; i<=n; ++i) {
print $0 a[i]
}
}
' fileB.txt fileA.txt > fileC.txt
Upvotes: 2
Reputation: 780929
Don't use awk
to get the current line of the file; it has to read the entire file each time. Just read the files in loop.
while read -r string1; do
while read -r string2; do
echo "$string1$string2"
done < fileB.txt
done < fileA.txt > fileC.txt
Upvotes: 1