Checking for specific amino acids in specific positions in a multiple sequence alignment

Question

There is a similar question on Stack Overflow, but it is using the Linux terminal (Search for specific characters in specific positions of line). I would like to do a similar thing using python, and I cant quite work out what is the pythonic way to do this without having to manually write for the membership checks.

I would like to search for specific amino acids at specific positions of a multiple sequence alignment. I have defined the positions of the amino acid alignment in a list of indexes ,

e.g Index = [1, 100, 235, 500].

I have defined the amino acids I want in those positions also.

Res1 = ["A","G"]
Res2 = ["T","F"]
Res3 = ["S,"W"]
Res4 = ["H","J"]

I am currently doing something like this :

for m in records_dict:
    if (records_dict[m].seq[Index[0]] \
        in Res1) and (records_dict[m].seq[Index[1]] \
        in Res2) and (records_dict[m].seq[Index[2]] \
        in Res3) and (records_dict[m].seq[Index[3]]\
        in Res4)
    print m

Now, suppose I have a list of 40 residues I want to check, I know I have to write the lists of residues to check manually, but surely, there is an easier way to do this membership check using a while loop or something else.

Also, is there any way I could incorporate a system where if no sequences match all the 40 membership checks, I would get the 5 best sequences that are closest to matching all 40 checks, and an output such as sequence "m" has 30/40 matches and a list of what those 30 matches are, and which 10 didn't match?

Checking for specific amino acids in specific positions in a multiple sequence alignment

Answers (1)

Related Questions