Reputation: 205
I need to run a command on thousands of files in a directory. However, the program I'm using needs a parameters file on which the name of inputs and output are indicated. The command is as follows:
./program parameters_file.txt
These are the lines I need to edit on the parameters_file.txt, which are lines 1-3. The rest of the lines (not shown) remain the same:
input_file = asd123.OK
input_file2 = asd123.TXT
outfile = asd123.RESULTS_OUT
As seen, all files have matching names and only their extension changes.
I need to loop this so that input_file, input_file2 and outfile are overwritten every time the loop restarts. Something like: edit parameters_file.txt with 1st file name, run command on 1st file, edit parameters_file.txt with 2nd file name, run command on 2nd file, etc.
Thought about:
for f in *.OK;
do
input_file = $f
input_file2 = $f.TXT
outfile = $f.RESULTS_OUT
But I don't know how to incorporate it on the command and I can't write the loop in the parameters_file.txt because it will crash the program. Maybe echoing the parameters_file.txt or overwriting with sed?
Thanks.
Upvotes: 0
Views: 871
Reputation: 50775
Use printf in a process substitution, don't bother with replacing strings in your parameters_file.txt
.
for f in *.OK; do
prog <(
printf 'input_file = %s\ninput_file2 = %s\noutfile = %s\n' "${f%OK}"{OK,TXT,RESULTS_OUT}
tail -n +4 parameters_file.txt
)
done
Upvotes: 2
Reputation: 207465
If you have thousands of files to process and they take an hour each, you could consider using GNU Parallel to get 4, 8 or 16 done in parallel and keep all the CPU cores busy for which you paid Intel so handsomely... else you'll be there for weeks. Also, if you have multiple computers in your network GNU Parallel can distribute your jobs and data across them too in order to speed things up.
So, assuming your files that need processing all end in *.OK
, a basic example would be this:
parallel -k echo {#} {.} ::: ads123.OK qwe987.OK tyu456.OK
That will output this:
1 ads123
2 qwe987
3 tyu456
so hopefully you can see {#}
is just the sequentially increasing job number and {.}
is the filename with the extension removed.
Ok, now you want to process your parameter file a bit before you start the job, so you will be better off writing a bash
function for each job in which you do the pre-processing, like this. I'll call the function doit()
:
doit(){
jobnum=$1
name=$2
paramfile="parameters.$jobnum"
echo Processing file: $name with parameters in file: $paramfile
}
# Make our function known to jobs started by GNU Parallel
export -f doit
# Now run the jobs
parallel -k doit {#} {.} ::: *.OK
Now all we need to do, is change doit()
to prepare your parameters, so we can do:
doit(){
jobnum=$1
name=$2
paramfile="parameters.$jobnum"
echo Processing file: $name with parameters in file: $paramfile
# Following code supplied by @poshi
echo "input_file = $name" > "$paramfile
echo "input_file2 = $name.TXT" >> "$paramfile"
echo "outfile = $name.RESULTS_OUT" >> "$paramfile"
# Add/copy/incorporate the rest of the parameters as you wish
echo program "$paramfile"
}
Upvotes: 3
Reputation: 5372
I guess something like this will achieve what you want:
#!/bin/bash
for file in *.OK; do
sed -i \
-e "s/input_file =.*/input_file = ${file}/" \
-e "s/input_file2.*/input_file2 = ${file%.OK}.TXT/" \
-e "s/outfile.*/outfile = ${file%.OK}.RESULTS_OUT/" \
parameters_file.txt
./program parameters_file.txt
done
Upvotes: 2
Reputation: 5762
Perform a loop that created the input parameter file and the, run the program:
for f in *.OK;
do
echo "input_file = $f" > parameters
"input_file2 = $f.TXT" >> parameters
"outfile = $f.RESULTS_OUT" >> parameters
# Add/copy/incorporate the rest of the parameters as you wish
./program parameters
done
Upvotes: 2