Ana
Ana

Reputation: 141

how to insert header in a dotplot fequency table with python

I have code for producing a dot plot from two DNA sequences, but I don't know how to add the header. Can I have some help, please?

def dotplot(seqx,seq2y):
plot=[["" for x in seqx] for y in seqy]
for xindex, xvalue in  enumerate(seqx):
    for yindex, yvalue in  enumerate(seqy):
        if xvalue == yvalue:
            plot[yindex][xindex]=="*" 
return plot

seqx = myseq
seqy = myseq2

dnaplot = dotplot(seqx, seqy)
for row in dnaplot:
    for column in row:
        print column,
    print

Desired output:

   a c t g c t g
 a *
 g       
 g       *      
 g                        
 a
 t            *

Upvotes: 0

Views: 148

Answers (1)

Oliver W.
Oliver W.

Reputation: 13459

There are several ways you could generate the table, with the headers. You could e.g. create the table (which you called plot and I've renamed in the code below to table) with one extra row and one extra column and then fill in the extra row and column with the letters from the nucleic acid sequence. This is shown here:

def dotplot(seqx,seqy):
    table = [[" "*3 for x in range(len(seqx) + 1)] for y in range(len(seqy) + 1)] # increased the size of an element in the table (3 spaces)

    # fill the column headers
    table[0][1:]  = [" %c " % (char,) for char in seqx]

    # fill the row headers:
    for row, char in zip(table[1:], seqy):
        row[0] = " %c " % (char,)

    # fill the content-part of the table
    for yindex, yvalue in  enumerate(seqy):
        for xindex, xvalue in  enumerate(seqx):
            if xvalue == yvalue: # your decision logic: when to add a star
                table[yindex + 1][xindex + 1] = " * " # Also added some spaces here
    return table

for row in table:
    print(''.join(row)) # chain together the elements in each `row` list

Notice that I have changed very little to the bulk of your function dotplot. Only the order of traversing the table (first rows, then columns within rows) has been changed and an offset (e.g. yindex + 1) has been added to account for the added row and column headers.

Another way would be to interweave the printing of the table, with the printing of the row and column headers. In the code below I have imported the print-function, which adds functionality from Python3 that makes this a bit easier in Python2.7.

from __future__ import print_function

def dotplot(seqx,seqy):
    table = [[" "*3 for x in seqx] for y in seqy] # increased the size of an element in the table (3 spaces)

    # print the column headers:
    print(' '*3 + ''.join(" %c " % (char,) for char in seqx))  # adds leading whitespace (for the row headers) and the characters from `seqx` with extra whitespace

    # print the table
    for yindex, yvalue in  enumerate(seqy):
        for xindex, xvalue in  enumerate(seqx):
            if xindex == 0: # first column -> add the row headers
                print(' %c ' % (yvalue,), end='')
            if xvalue == yvalue: # your decision logic: when to add a star
                table[yindex][xindex] = " * " # Also added some spaces here
        print(''.join(table[yindex]))
    return table

Both approaches will print a table of this form:

    a  c  t  g  c  t  g 
 a  *                   
 g           *        * 
 g           *        * 
 g           *        * 
 a  *                   
 t        *        *    

I did not change the logic in your code which specifies when to add a star. At the moment that logic only checks for equality between the nucleobases (if xvalue == yvalue:), which is why you'll see stars every time there's a match. My guess is this is not what you want, so you should reconsider the logic in that case. But your question dealt with adding column and row headers for which I hope to have given you something to continue with.

Also, if you're working a lot with nucleic acid sequences, you should consider looking into BioPython.

Upvotes: 1

Related Questions