ScubaChris
ScubaChris

Reputation: 198

Bash script loop only running once

I am trying to parse an input file (my test file is 4 lines) and then query an online biological database. However my loop seems to stop after returning the first result.

#!/bin/bash
if [ "$1" = "" ]; then
        echo "No input file to parse given. Give me a BLAST output file"
else
        file=$1
        #Extracts GI from each result and stores it on temp file.
        rm -rf /home/chris/TEMP/tempfile.txt
        awk -F '|' '{printf("%s\n",$2);}' "$file" >> /home/chris/TEMP/tempfile.txt
        #gets the species from each gi.
        input="/home/chris/TEMP/tempfile.txt"
        while read -r i
        do
                echo GI:"$i"
                /home/chris/EntrezDirect/edirect/esearch -db protein -query "$i" | /home/chris/EntrezDirect/edirect/efetch -format gpc | /home/chris/EntrezDirect/edirect/xtract -insd source o
rganism | cut -f2 
        done < "$input"
        rm -rf /home/chris/TEMP/tempfile.txt
fi

For example, my only output is

GI:751637161

Pseudomonas stutzeri group

whereas I should have 4 results. Any help appreciated and thanks in advance.

This is the format of the sample input:

TARA042SRF022_1 gi|751637161|ref|WP_041104882.1|    40.4    151 82  2   999 547 1   143 2.8e-21 110.9
TARA042SRF022_2 gi|1057355277|ref|WP_068715547.1|   62.7    263 96  1   915 133 80  342 7.1e-96 358.6
TARA042SRF022_3 gi|950462516|ref|WP_057369049.1|    38.3    47  29  0   184 44  152 198 5.1e+01 36.2
TARA042SRF022_4 gi|918428433|ref|WP_052479609.1|    37.5    48  29  1   525 668 192 238 6.1e+01 37.0

Upvotes: 4

Views: 2073

Answers (2)

chepner
chepner

Reputation: 531625

It would appear that read -r i is returning with a non-zero exit status on its second call, indicating that there is no more data to be read from the input file. This usually means that a command inside the while loop is also reading from standard input, and is consuming the remainder of the file before read has a chance.

The only candidate here is esearch, as echo does not read from standard input and the other commands are all reading from the previous command in the pipeline. Redirect standard input for esearch so that it does not consume your input data inadvertently.

while read -r i
do
    echo GI:"$i"
    /home/chris/EntrezDirect/edirect/esearch -db protein -query "$i" < /dev/null |
      /home/chris/EntrezDirect/edirect/efetch -format gpc |
      /home/chris/EntrezDirect/edirect/xtract -insd source organism |
      cut -f2 
done < "$input"

Upvotes: 5

Rolf
Rolf

Reputation: 1199

Use cut to extract columns from an ASCII file, use the -d option to denote the delimiter and -f to specify the column. Wrap everything in a loop like so

$ cat data.txt
TARA042SRF022_1 gi|751637161|ref|WP_041104882.1|    40.4    151 82  2   999 547 1   143 2.8e-21 110.9
TARA042SRF022_2 gi|1057355277|ref|WP_068715547.1|   62.7    263 96  1   915 133 80  342 7.1e-96 358.6
TARA042SRF022_3 gi|950462516|ref|WP_057369049.1|    38.3    47  29  0   184 44  152 198 5.1e+01 36.2
TARA042SRF022_4 gi|918428433|ref|WP_052479609.1|    37.5    48  29  1   525 668 192 238 6.1e+01 37.0

$ cat t.sh
#!/bin/bash

for gi in $(cut -d"|" -f 2 data.txt); do
    echo $gi
done

$ bash t.sh
751637161
1057355277
950462516
918428433

Edit: I cannot reproduce the problem but I feel it is linked to newlines and/or the usage of a temp file. My suggestions omits this but does not answer your actual question (but your problem I guess)

Upvotes: 0

Related Questions