Xiong89
Xiong89

Reputation: 777

Python: How to output the variable based on the maximum value obtained?

I have a list of DNA sequences as given below. I would like to get the consensus sequence which is the highest frequency obtain in that particular position.

test.txt

>human
ATCAATTGCT
>human
GCTAGCTAGC
>human
GCTAGCTAGC
>human
GCTGATCGGC
>human
GCTTACAACG

Using the code below, I obtain the total A, C, G, and T from each position.

Code

from Bio import motifs
output=open("test_output.txt","a")
with open("test.txt") as handle:
    motif = motifs.read(handle, 'sites')
    output.write(str(motif.counts))

Example output

        0      1      2      3      4      5      6      7      8      9
A:   1.00   0.00   0.00   3.00   3.00   0.00   1.00   3.00   0.00   0.00
C:   0.00   4.00   1.00   0.00   0.00   3.00   1.00   0.00   2.00   3.00
G:   4.00   0.00   0.00   1.00   2.00   0.00   0.00   2.00   3.00   1.00
T:   0.00   1.00   4.00   1.00   0.00   2.00   3.00   0.00   0.00   1.00

How can I get the output of each bases stated in the last column?

Desired output

    0      1      2      3      4      5      6      7      8      9
A:   1.00   0.00   0.00   3.00   3.00   0.00   1.00   3.00   0.00   0.00
C:   0.00   4.00   1.00   0.00   0.00   3.00   1.00   0.00   2.00   3.00
G:   4.00   0.00   0.00   1.00   2.00   0.00   0.00   2.00   3.00   1.00
T:   0.00   1.00   4.00   1.00   0.00   2.00   3.00   0.00   0.00   1.00
     G      C      T      A      A      C      T      A      G      C 

Upvotes: 0

Views: 65

Answers (1)

heathobrien
heathobrien

Reputation: 1027

Motifs has a consensus method that does exactly what you want:

 output.write("\t".join(list(motif.consensus)))

Upvotes: 2

Related Questions