Reputation: 2866
I want to write a code that will output an awk and bash script. This script basically cuts a file up into small pieces for programs to run in parallel and I want to control the number of peices, rather then having a set number as I do now. My current code is set to cut the file into 10 parts using awk and then execute a bash script.
awk -v a=$a '{if (NR<(a/10)&&NR>=0) print }' $1 > $11
awk -v a=$a '{if (NR<(a/10*2)&&NR>=(a/10*1)) print }' $1 >$12
awk -v a=$a '{if (NR<(a/10*3)&&NR>=(a/10*2)) print }' $1 >$13
awk -v a=$a '{if (NR<(a/10*4)&&NR>=(a/10*3)) print }' $1 >$14
awk -v a=$a '{if (NR<(a/10*5)&&NR>=(a/10*4)) print }' $1 >$15
awk -v a=$a '{if (NR<(a/10*6)&&NR>=(a/10*5)) print }' $1 >$16
awk -v a=$a '{if (NR<(a/10*7)&&NR>=(a/10*6)) print }' $1 >$17
awk -v a=$a '{if (NR<(a/10*8)&&NR>=(a/10*7)) print }' $1 >$18
awk -v a=$a '{if (NR<(a/10*9)&&NR>=(a/10*8)) print }' $1 >$19
awk -v a=$a '{if (NR<=(a/10*10)&&NR>=(a/10*9)) print }' $1 >$110
bash $2 $11&
bash $2 $12&
bash $2 $13&
bash $2 $14&
bash $2 $15&
bash $2 $16&
bash $2 $17&
bash $2 $18&
bash $2 $19&
bash $2 $110&
I want it so I can type in 20 and it will write this script out 20 times. I just can't seem to come up with a way to do this with a loop in my head.
Thanks for the help.
EDIT
Some more information on variables
a=`wc -l $1 | cut -f 1 -d " "`
I'm also not sure how to write a loop to give the following code:
cat $11.tab $12.tab $13.tab $14.tab $15.tab $16.tab $17.tab $18.tab $19.tab $110.tab > $3
Upvotes: 1
Views: 537
Reputation: 754160
This answer doesn't explore alternatives like using split
or csplit
to partition the file.
Assuming that a=$(wc -l < $1)
, and that $3
contains the number of fragments (10 in the example written out longhand), then you can take your existing code and package it as one or two loops using seq
to generate the numbers you need:
a=$(wc -l < "$1")
n=${3:-10}
for i in $(seq 1 $n)
do
# a = number of records in file
# n = number of parts the file is to be split into
# p = part number of current part
awk -v a=$a -v n=$n -v p=$i '{if (NR<(a/n*p)&&NR>=(a/n*(p-1))) print }' "$1" >"$1.$i"
bash "$2" "$1.$i" &
done
wait # For all the background processes to complete
That's the single loop version; you can create all the files first and then run a second loop to create all the background processes.
I strongly suspect that you could use a single awk
script to split the file:
a=$(wc -l < "$1")
n=${3:-10}
awk -v a=$a -v n=$n -v f="$1" -e \
'{ nfn = int((n*NR)/a)+1;
if (nfn != ofn)
{
ofile = sprintf("%s.%d", f, nfn);
ofn = nfn;
}
print > ofile
}' "$1"
for i in $(seq 1 $3)
do
bash "$2" "$1.$i" &
done
wait # For all the background processes to complete
None of this code has been past awk
or bash
so there could be mistakes in it.
Upvotes: 1
Reputation: 97968
split
looks like a simpler alternative:
INPUT=$1 # input file
N=$2 # number of lines per file
SCRIPT=$3 # script to run
mkdir chunks
cd chunks
split "../$INPUT" -l "$N"
for file in *; do
bash "../$SCRIPT" "$file" &
done
Upvotes: 1