Reputation: 205

Loop overwriting a file

I need to run a command on thousands of files in a directory. However, the program I'm using needs a parameters file on which the name of inputs and output are indicated. The command is as follows:

./program parameters_file.txt

These are the lines I need to edit on the parameters_file.txt, which are lines 1-3. The rest of the lines (not shown) remain the same:

 input_file = asd123.OK 
 input_file2 = asd123.TXT    
 outfile = asd123.RESULTS_OUT

As seen, all files have matching names and only their extension changes.

I need to loop this so that input_file, input_file2 and outfile are overwritten every time the loop restarts. Something like: edit parameters_file.txt with 1st file name, run command on 1st file, edit parameters_file.txt with 2nd file name, run command on 2nd file, etc.

Thought about:

for f in *.OK;
do
     input_file = $f 
     input_file2 = $f.TXT    
     outfile = $f.RESULTS_OUT

But I don't know how to incorporate it on the command and I can't write the loop in the parameters_file.txt because it will crash the program. Maybe echoing the parameters_file.txt or overwriting with sed?

Thanks.

Upvotes: 0

Answers (4)

oguz ismail

Reputation: 50775

Use printf in a process substitution, don't bother with replacing strings in your parameters_file.txt.

for f in *.OK; do
  prog <(
    printf 'input_file = %s\ninput_file2 = %s\noutfile = %s\n' "${f%OK}"{OK,TXT,RESULTS_OUT}
    tail -n +4 parameters_file.txt
  )
done

Upvotes: 2

Mark Setchell

Reputation: 207465

If you have thousands of files to process and they take an hour each, you could consider using GNU Parallel to get 4, 8 or 16 done in parallel and keep all the CPU cores busy for which you paid Intel so handsomely... else you'll be there for weeks. Also, if you have multiple computers in your network GNU Parallel can distribute your jobs and data across them too in order to speed things up.

So, assuming your files that need processing all end in *.OK, a basic example would be this:

parallel -k echo {#} {.} ::: ads123.OK qwe987.OK tyu456.OK

That will output this:

1 ads123
2 qwe987
3 tyu456

so hopefully you can see {#} is just the sequentially increasing job number and {.} is the filename with the extension removed.

Ok, now you want to process your parameter file a bit before you start the job, so you will be better off writing a bash function for each job in which you do the pre-processing, like this. I'll call the function doit():

doit(){
   jobnum=$1
   name=$2
   paramfile="parameters.$jobnum"
   echo Processing file: $name with parameters in file: $paramfile
}
# Make our function known to jobs started by GNU Parallel
export -f doit

# Now run the jobs
parallel -k doit {#} {.} ::: *.OK

Now all we need to do, is change doit() to prepare your parameters, so we can do:

doit(){
   jobnum=$1
   name=$2
   paramfile="parameters.$jobnum"
   echo Processing file: $name with parameters in file: $paramfile
   # Following code supplied by @poshi
   echo "input_file = $name"          >  "$paramfile
   echo "input_file2 = $name.TXT"     >> "$paramfile"
   echo "outfile = $name.RESULTS_OUT" >> "$paramfile"
   # Add/copy/incorporate the rest of the parameters as you wish
   echo program "$paramfile"
}

Upvotes: 3

accdias

Reputation: 5372

I guess something like this will achieve what you want:

#!/bin/bash

for file in *.OK; do
    sed -i \
    -e "s/input_file =.*/input_file = ${file}/" \ 
    -e "s/input_file2.*/input_file2 = ${file%.OK}.TXT/" \
    -e "s/outfile.*/outfile = ${file%.OK}.RESULTS_OUT/" \
    parameters_file.txt

    ./program parameters_file.txt 
done

Upvotes: 2

Poshi

Reputation: 5762

Perform a loop that created the input parameter file and the, run the program:

for f in *.OK;
do
    echo "input_file = $f" > parameters
    "input_file2 = $f.TXT" >> parameters
    "outfile = $f.RESULTS_OUT" >> parameters
    # Add/copy/incorporate the rest of the parameters as you wish

    ./program parameters
done

Upvotes: 2

Loop overwriting a file

Answers (4)

Related Questions