Matt
Matt

Reputation: 73

Blasting remotely from biopython

I'm trying to remotely blast about 70 200-nt sequences using Biopython. I've been trying for hours to figure out why the following Python script won't work.

I can get it to work for a read file that contains just one fasta using SeqIO.read, but when I try to switch to SeqIO.parse, I don't get anything in the .xml save file that I create.

Any ideas?

As a side note, if anyone knows the option syntax for excluding organisms from the results (as is possible when using the ncbi website, please let me know).

Thanks very much for any help.

Matt

from Bio.Blast import NCBIWWW
from Bio import SeqIO
import tkinter.filedialog as tkfd
in_file=tkfd.askopenfilename()
record = SeqIO.parse(in_file, format="fasta")
out_file = tkfd.asksaveasfilename()
save_file = open(out_file, "w")
for rec in record:
  print(rec)
  result_handle = NCBIWWW.qblast("blastn", "nt", rec.format("fasta"))
  save_file.write(result_handle.read())
  result_handle.close()
else:
  save_file.close()

This is the content of the in_file that I'm using as a test file (the line formatting of my file is set at 80 char, which might have been lost below, also, the space shown below between records is not in my test file):

>165613 TAACTGCAGTGTTTTGTGTCGAGCCTTTTTTGTGCCTTTTTTATAAAGGCATAACGTTATATTTAATTGAAGAGTTTGAT TCTGGCTCAGATTGAACGCTAGCGGCATGCTTAACACATGCAAGTCGAACGGCAGCGCGGGGAGCTTGCTCCCTGGCGGC GAGTGGCGGACGGGTGAGTAATGCGTAGGAATCTACCTTG

>165875 GGGATCTTCGGACCTCGTGCTATAAGATGAGCCTACGTCGGATTAGCTTGTTGGTGGGGTAATGGCCTACCAAGGCGACG ATCCGTAGCTGGTCTGAGAGGACGATCAGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGG GAATATTGGACAATGGGGGAAACCCTGATCCAGCAATGCC

Upvotes: 2

Views: 487

Answers (1)

Jose Ricardo Bustos M.
Jose Ricardo Bustos M.

Reputation: 8164

you problem is library tkinter, next code works well (biopython) ..... is mandatory use GUI ?

from Bio.Blast import NCBIWWW
from Bio import SeqIO

in_file = open("input.fasta")
record = SeqIO.parse(in_file, format="fasta")

save_file = open("out_file.blast", "w")
for rec in record:
  print(rec)
  result_handle = NCBIWWW.qblast("blastn", "nt", rec.format("fasta"))
  save_file.write(result_handle.read())
  result_handle.close()
else:
  save_file.close()

get following result:

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastn</BlastOutput_program>
  <BlastOutput_version>BLASTN 2.2.31+</BlastOutput_version>
  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&amp;auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>nt</BlastOutput_db>
  <BlastOutput_query-ID>Query_142405</BlastOutput_query-ID>
  <BlastOutput_query-def>165613</BlastOutput_query-def>
  <BlastOutput_query-len>200</BlastOutput_query-len>
....

I tested your code , I find the following statement wrong

record = SeqIO.parse(open(in_file), format="fasta")

since, in_file is string y don't is type file

Upvotes: 1

Related Questions