Reputation: 198
I am trying to parse an input file (my test file is 4 lines) and then query an online biological database. However my loop seems to stop after returning the first result.
#!/bin/bash
if [ "$1" = "" ]; then
echo "No input file to parse given. Give me a BLAST output file"
else
file=$1
#Extracts GI from each result and stores it on temp file.
rm -rf /home/chris/TEMP/tempfile.txt
awk -F '|' '{printf("%s\n",$2);}' "$file" >> /home/chris/TEMP/tempfile.txt
#gets the species from each gi.
input="/home/chris/TEMP/tempfile.txt"
while read -r i
do
echo GI:"$i"
/home/chris/EntrezDirect/edirect/esearch -db protein -query "$i" | /home/chris/EntrezDirect/edirect/efetch -format gpc | /home/chris/EntrezDirect/edirect/xtract -insd source o
rganism | cut -f2
done < "$input"
rm -rf /home/chris/TEMP/tempfile.txt
fi
For example, my only output is
GI:751637161
Pseudomonas stutzeri group
whereas I should have 4 results. Any help appreciated and thanks in advance.
This is the format of the sample input:
TARA042SRF022_1 gi|751637161|ref|WP_041104882.1| 40.4 151 82 2 999 547 1 143 2.8e-21 110.9
TARA042SRF022_2 gi|1057355277|ref|WP_068715547.1| 62.7 263 96 1 915 133 80 342 7.1e-96 358.6
TARA042SRF022_3 gi|950462516|ref|WP_057369049.1| 38.3 47 29 0 184 44 152 198 5.1e+01 36.2
TARA042SRF022_4 gi|918428433|ref|WP_052479609.1| 37.5 48 29 1 525 668 192 238 6.1e+01 37.0
Upvotes: 4
Views: 2073
Reputation: 531625
It would appear that read -r i
is returning with a non-zero exit status on its second call, indicating that there is no more data to be read from the input file. This usually means that a command inside the while
loop is also reading from standard input, and is consuming the remainder of the file before read
has a chance.
The only candidate here is esearch
, as echo
does not read from standard input and the other commands are all reading from the previous command in the pipeline. Redirect standard input for esearch
so that it does not consume your input data inadvertently.
while read -r i
do
echo GI:"$i"
/home/chris/EntrezDirect/edirect/esearch -db protein -query "$i" < /dev/null |
/home/chris/EntrezDirect/edirect/efetch -format gpc |
/home/chris/EntrezDirect/edirect/xtract -insd source organism |
cut -f2
done < "$input"
Upvotes: 5
Reputation: 1199
Use cut
to extract columns from an ASCII file, use the -d
option to denote the delimiter and -f
to specify the column. Wrap everything in a loop like so
$ cat data.txt
TARA042SRF022_1 gi|751637161|ref|WP_041104882.1| 40.4 151 82 2 999 547 1 143 2.8e-21 110.9
TARA042SRF022_2 gi|1057355277|ref|WP_068715547.1| 62.7 263 96 1 915 133 80 342 7.1e-96 358.6
TARA042SRF022_3 gi|950462516|ref|WP_057369049.1| 38.3 47 29 0 184 44 152 198 5.1e+01 36.2
TARA042SRF022_4 gi|918428433|ref|WP_052479609.1| 37.5 48 29 1 525 668 192 238 6.1e+01 37.0
$ cat t.sh
#!/bin/bash
for gi in $(cut -d"|" -f 2 data.txt); do
echo $gi
done
$ bash t.sh
751637161
1057355277
950462516
918428433
Edit: I cannot reproduce the problem but I feel it is linked to newlines and/or the usage of a temp file. My suggestions omits this but does not answer your actual question (but your problem I guess)
Upvotes: 0