Reputation: 83
I'm just getting started with Python and BioPython and don't have too much programming experience. I'd appreciate any help you guys could give me.
I'm trying to extract CDS and/or rRNA sequences from genbank. It's important that I'm only getting the open reading frame which is why I'm not just pulling the whole sequence. When I run the code below it kicks back an error saying:
no records found in handle
for the line of code that reads: record = SeqIO.read(handle, "genbank")
. I'm not sure how to correct this issue. I've included the code I'm using below.
Also, if there is an easier way of doing this, or published code, I'd appreciate if you guys let me know.
Thanks!
# search sequences by a combination of keywords
# need to find (number of) results to set 'retmax' value
handle = Entrez.esearch(db = searchdb, term = searchterm)
records = Entrez.read(handle)
handle.close()
# repeat search with appropriate 'retmax' value
all_handle = Entrez.esearch(db = searchdb, term = searchterm, retmax = records['Count'])
records = Entrez.read(all_handle)
print " "
print "Number of sequences found:", records['Count'] #printing to make sure that code is working thus far.
print " "
locations = [] # store locations of target sequences
sequences = [] # store target sequences
for i in range(0,int(records['Count'])) :
handle = Entrez.efetch(db = searchdb, id = records['IdList'][i], rettype = "gb", retmode = "xml")
record = SeqIO.read(handle, "genbank")
for feature in record.features:
if feature.type==searchfeaturetype: #searches features for proper feature type
if searchgeneproduct in feature.qualifiers['product'][0]: #searches features for proper gene product
if str(feature.qualifiers) not in locations: # no repeat location entries
locations.append(str(feature.location)) # appends location entry
sequences.append(feature.extract(record.seq)) # append sequence
Upvotes: 0
Views: 2931
Reputation: 44093
You are requesting xml
from genbank when SeqIO.read
expects the format to be the genbank flat file format. Try changing your efetch
line to this:
handle = Entrez.efetch(db = searchdb, id = records['IdList'][i], rettype = "gb", retmode = "txt")
Upvotes: 1