Reputation: 777
I have a list of DNA sequences as given below. I would like to get the consensus sequence which is the highest frequency obtain in that particular position.
test.txt
>human
ATCAATTGCT
>human
GCTAGCTAGC
>human
GCTAGCTAGC
>human
GCTGATCGGC
>human
GCTTACAACG
Using the code below, I obtain the total A, C, G, and T from each position.
Code
from Bio import motifs
output=open("test_output.txt","a")
with open("test.txt") as handle:
motif = motifs.read(handle, 'sites')
output.write(str(motif.counts))
Example output
0 1 2 3 4 5 6 7 8 9
A: 1.00 0.00 0.00 3.00 3.00 0.00 1.00 3.00 0.00 0.00
C: 0.00 4.00 1.00 0.00 0.00 3.00 1.00 0.00 2.00 3.00
G: 4.00 0.00 0.00 1.00 2.00 0.00 0.00 2.00 3.00 1.00
T: 0.00 1.00 4.00 1.00 0.00 2.00 3.00 0.00 0.00 1.00
How can I get the output of each bases stated in the last column?
Desired output
0 1 2 3 4 5 6 7 8 9
A: 1.00 0.00 0.00 3.00 3.00 0.00 1.00 3.00 0.00 0.00
C: 0.00 4.00 1.00 0.00 0.00 3.00 1.00 0.00 2.00 3.00
G: 4.00 0.00 0.00 1.00 2.00 0.00 0.00 2.00 3.00 1.00
T: 0.00 1.00 4.00 1.00 0.00 2.00 3.00 0.00 0.00 1.00
G C T A A C T A G C
Upvotes: 0
Views: 65
Reputation: 1027
Motifs has a consensus method that does exactly what you want:
output.write("\t".join(list(motif.consensus)))
Upvotes: 2