Reputation: 199
I have the below script to count the number of codons (codon.list.csv) in a gene file (test.fasta), it is however counting all codons irrespective of frame, I would like to count each codon only in frame 0, (ATG,TAT,TAT,TAA). For example:
>test1
ATGTATTATTAA
ATG:1 TAT:2 TAA:1
At the moment my script is counting TGT,ATT,TTA etc.. which I don't require.
I thought this would be easier but I cannot get it corrected...
Any advice would be great!
from Bio import SeqIO
mRNA_sequences = "test.fasta"
in_seq_handle = open(mRNA_sequences)
seq_dict = SeqIO.to_dict(SeqIO.parse(in_seq_handle, "fasta"))
in_seq_handle.close()
seq_dict_keys = seq_dict.keys()
dict_sequences2={}
dict_codons = {}
contig_file = open("codon.list.csv")
for line in contig_file:
gene_id = line[0:3]
for sequence in seq_dict.values():
seqstring = sequence.seq
if dict_hepts.has_key((line[:-1])):
dict_codons[(line[:-1])] += seqstring.count(gene_id)
else:
dict_codons[(line[:-1])] = seqstring.count(gene_id)
print dict_codons
Upvotes: 1
Views: 3900
Reputation: 2742
How about this:
a = 'ATGTATTATTAA'
codons = (a[n:n+3] for n in xrange(0,len(a),3)) # creates generator
dict_codons = {}
for codon in codons:
if dict_codons.has_key(codon):
dict_codons[codon] += 1
else:
dict_codons[codon] = 1
print dict_codons
To put it short, this code generates a generator that yields codons in frame 0, and counts them to store data in dictionary.
Upvotes: 2