Reputation: 141
I have code for producing a dot plot from two DNA sequences, but I don't know how to add the header. Can I have some help, please?
def dotplot(seqx,seq2y):
plot=[["" for x in seqx] for y in seqy]
for xindex, xvalue in enumerate(seqx):
for yindex, yvalue in enumerate(seqy):
if xvalue == yvalue:
plot[yindex][xindex]=="*"
return plot
seqx = myseq
seqy = myseq2
dnaplot = dotplot(seqx, seqy)
for row in dnaplot:
for column in row:
print column,
print
Desired output:
a c t g c t g
a *
g
g *
g
a
t *
Upvotes: 0
Views: 148
Reputation: 13459
There are several ways you could generate the table, with the headers. You could e.g. create the table (which you called plot
and I've renamed in the code below to table
) with one extra row and one extra column and then fill in the extra row and column with the letters from the nucleic acid sequence. This is shown here:
def dotplot(seqx,seqy):
table = [[" "*3 for x in range(len(seqx) + 1)] for y in range(len(seqy) + 1)] # increased the size of an element in the table (3 spaces)
# fill the column headers
table[0][1:] = [" %c " % (char,) for char in seqx]
# fill the row headers:
for row, char in zip(table[1:], seqy):
row[0] = " %c " % (char,)
# fill the content-part of the table
for yindex, yvalue in enumerate(seqy):
for xindex, xvalue in enumerate(seqx):
if xvalue == yvalue: # your decision logic: when to add a star
table[yindex + 1][xindex + 1] = " * " # Also added some spaces here
return table
for row in table:
print(''.join(row)) # chain together the elements in each `row` list
Notice that I have changed very little to the bulk of your function dotplot
. Only the order of traversing the table (first rows, then columns within rows) has been changed and an offset (e.g. yindex + 1
) has been added to account for the added row and column headers.
Another way would be to interweave the printing of the table, with the printing of the row and column headers. In the code below I have imported the print-function, which adds functionality from Python3 that makes this a bit easier in Python2.7.
from __future__ import print_function
def dotplot(seqx,seqy):
table = [[" "*3 for x in seqx] for y in seqy] # increased the size of an element in the table (3 spaces)
# print the column headers:
print(' '*3 + ''.join(" %c " % (char,) for char in seqx)) # adds leading whitespace (for the row headers) and the characters from `seqx` with extra whitespace
# print the table
for yindex, yvalue in enumerate(seqy):
for xindex, xvalue in enumerate(seqx):
if xindex == 0: # first column -> add the row headers
print(' %c ' % (yvalue,), end='')
if xvalue == yvalue: # your decision logic: when to add a star
table[yindex][xindex] = " * " # Also added some spaces here
print(''.join(table[yindex]))
return table
Both approaches will print a table of this form:
a c t g c t g
a *
g * *
g * *
g * *
a *
t * *
I did not change the logic in your code which specifies when to add a star. At the moment that logic only checks for equality between the nucleobases (if xvalue == yvalue:
), which is why you'll see stars every time there's a match. My guess is this is not what you want, so you should reconsider the logic in that case. But your question dealt with adding column and row headers for which I hope to have given you something to continue with.
Also, if you're working a lot with nucleic acid sequences, you should consider looking into BioPython.
Upvotes: 1