sheaph
sheaph

Reputation: 199

Python, Bioinformatics query

I am new to python and I would like to know if what I am attempting is possible. I have a section here from a DNA alignment and I was wondering if for each location of a gap "-" on the bottom I could identify the nucleotide on the top line. Here I would be looking to return "G".

My efforts so far have not been successful. The alignment is:

ATTCAGGCCTAGCA
:::::  :: ::::
ATTCAA-CCAAGCA

I appreciate any assistance!

Upvotes: 1

Views: 369

Answers (4)

yuriy babin
yuriy babin

Reputation: 11

You'd better use biopython library. It has many data types designed to manipulate DNA, RNA and protein sequences (alignments, trees, etc). In this case AlignIO from biopython package will definitely help you.

from Bio import AlignIO
# reading your sequences:
alignment = AlignIO.read("my_seq.fa", "fasta")

# length of any alignment row is equal, so number of columns is here

cols = len(alignment[0])
# access to the rows and columns is like in the Numpy array
for col in range(cols):  
    if alignment[ : , col][1] == "-":
        print("gap!")

Upvotes: 1

hello_there_andy
hello_there_andy

Reputation: 2083

above = 'ATTCAGGCCTAGCA'
below = 'ATTCAA-CCAAGCA'
gap_letters = [above[i] for i,j in enumerate(below) if j=='-']

Upvotes: 1

Pines
Pines

Reputation: 396

Not sure how your data is saved. Let's say it's two equal length strings in a tuple:

dna_pair = ('ATTCAGGCCTAGCA','ATTCAA-CCAAGCA')

Then you could try:

def find_align(dna_pair):
    for i in range(len(dna_pair[0])):
        if dna_pair[1][i] == '-':
            return dna_pair[0][i]

Upvotes: 1

asalic
asalic

Reputation: 949

As I don't have any information about the data format, I will tell you the general process. Create 2 lists with the first and last line respectively (which I suppose are aligned and have the same length) and iterate over them. At each step verify if the character at the current position in the last array is a '-' and if so, print the character from the other array.

Upvotes: 1

Related Questions