jeffpkamp
jeffpkamp

Reputation: 2866

Awk/Bash writing script

I want to write a code that will output an awk and bash script. This script basically cuts a file up into small pieces for programs to run in parallel and I want to control the number of peices, rather then having a set number as I do now. My current code is set to cut the file into 10 parts using awk and then execute a bash script.

awk -v a=$a '{if (NR<(a/10)&&NR>=0) print }' $1 > $11
awk -v a=$a '{if (NR<(a/10*2)&&NR>=(a/10*1)) print }' $1 >$12
awk -v a=$a '{if (NR<(a/10*3)&&NR>=(a/10*2)) print }' $1 >$13
awk -v a=$a '{if (NR<(a/10*4)&&NR>=(a/10*3)) print }' $1 >$14
awk -v a=$a '{if (NR<(a/10*5)&&NR>=(a/10*4)) print }' $1 >$15
awk -v a=$a '{if (NR<(a/10*6)&&NR>=(a/10*5)) print }' $1 >$16
awk -v a=$a '{if (NR<(a/10*7)&&NR>=(a/10*6)) print }' $1 >$17
awk -v a=$a '{if (NR<(a/10*8)&&NR>=(a/10*7)) print }' $1 >$18
awk -v a=$a '{if (NR<(a/10*9)&&NR>=(a/10*8)) print }' $1 >$19
awk -v a=$a '{if (NR<=(a/10*10)&&NR>=(a/10*9)) print }' $1 >$110

bash $2 $11&
bash $2 $12&
bash $2 $13&
bash $2 $14&
bash $2 $15&
bash $2 $16&
bash $2 $17&
bash $2 $18&
bash $2 $19&
bash $2 $110&

I want it so I can type in 20 and it will write this script out 20 times. I just can't seem to come up with a way to do this with a loop in my head.

Thanks for the help.

EDIT

Some more information on variables

a=`wc -l $1 | cut -f 1 -d " "`

I'm also not sure how to write a loop to give the following code:

cat $11.tab $12.tab $13.tab $14.tab $15.tab $16.tab $17.tab $18.tab $19.tab $110.tab > $3

Upvotes: 1

Views: 537

Answers (2)

Jonathan Leffler
Jonathan Leffler

Reputation: 754160

This answer doesn't explore alternatives like using split or csplit to partition the file.

Assuming that a=$(wc -l < $1), and that $3 contains the number of fragments (10 in the example written out longhand), then you can take your existing code and package it as one or two loops using seq to generate the numbers you need:

a=$(wc -l < "$1")
n=${3:-10}
for i in $(seq 1 $n)
do
    # a = number of records in file
    # n = number of parts the file is to be split into
    # p = part number of current part
    awk -v a=$a -v n=$n -v p=$i '{if (NR<(a/n*p)&&NR>=(a/n*(p-1))) print }' "$1" >"$1.$i"
    bash "$2" "$1.$i" &
done
wait   # For all the background processes to complete

That's the single loop version; you can create all the files first and then run a second loop to create all the background processes.

I strongly suspect that you could use a single awk script to split the file:

a=$(wc -l < "$1")
n=${3:-10}
awk -v a=$a -v n=$n -v f="$1" -e \
   '{   nfn = int((n*NR)/a)+1;
        if (nfn != ofn)
        {
            ofile = sprintf("%s.%d", f, nfn);
            ofn = nfn;
        }
        print > ofile
    }' "$1"

for i in $(seq 1 $3)
do
    bash "$2" "$1.$i" &
done
wait   # For all the background processes to complete

None of this code has been past awk or bash so there could be mistakes in it.

Upvotes: 1

perreal
perreal

Reputation: 97968

split looks like a simpler alternative:

INPUT=$1  # input file
N=$2      # number of lines per file
SCRIPT=$3 # script to run

mkdir chunks
cd chunks
split "../$INPUT" -l "$N"
for file in *; do
   bash "../$SCRIPT" "$file" &
done

Upvotes: 1

Related Questions