V. vdVorst
V. vdVorst

Reputation: 11

Run for loop files with multiple commands, run next file when first command previous file done

For a project I created multiple python scripts and I want to run these on a directory of files in a shell script. In this shell script I already created a for loop with multiple commands. The first command is a python script that blasts the input file against a local database and takes up most cores. The next commands take up way less cores, but take a lot of time. It is very important that for each file the commands are run in a series. To save time I wanted to alter the shell script to run the first command of a file and when it is done, to run the next commands on the output and the first command on the next file simultaneously.

Can anybody help me with this? I tried to search myself, but I can't find the answer. I have not tried running this script, as I am already running the python scripts without a shell script.

This is the script so far:

#!/bin/bash
tsv=/home/user/tsv
fasta=/home/user/fasta/*
clustering=/home/user/clustering

for file in ${fasta}
do
    python blastn_new.py --fasta ${file} --tsv ${tsv}/${file}.tsv &&
    mkdir ${clustering}/${file} &&
    mkdir ${clustering}/${file}/clusters &&
    python blastparsPB.py --clusters ${clustering}/${file}/${file}.txt --fish ${tsv}/${file}.tsv --dir ${clustering}/${file}/clusters/
done

Upvotes: 1

Views: 591

Answers (1)

tripleee
tripleee

Reputation: 189407

You can run the second script in the background.

The following also has some tangential comments, and reformats your code slightly.

#!/bin/bash

# You cannot have spaces around the equals signs
# Also, avoid hard-coding an absolute path
tsv=./tsv
db=./newpacbiodb/pacbiodb
clustering=./clustering

# Notice proper quoting throughout
for file in ./fasta/*
do
    python blastn_new.py \
        --fasta "${file}" \
        --tsv "${tsv}/${file}.tsv" &&
    # mkdir -p creates an entire path if necessary
    # (and works fine even if the directory already exists)
    mkdir -p "${clustering}/${file}/clusters" &&
    python blastparsPB.py \
        --clusters "${clustering}/${file}/${file}.txt" \
        --fish "${tsv}/${file}.tsv" \
        --dir "${clustering}/${file}/clusters/" &
done # notice the simple addition of background ^ job

Obviously, this assumes that the second Python script doesn't dislike having something connect e.g. to the database for writing at the same time, but that's already a given.

Upvotes: 1

Related Questions