Reputation: 75
There is a similar question on Stack Overflow, but it is using the Linux terminal (Search for specific characters in specific positions of line). I would like to do a similar thing using python, and I cant quite work out what is the pythonic way to do this without having to manually write for the membership checks.
I would like to search for specific amino acids at specific positions of a multiple sequence alignment. I have defined the positions of the amino acid alignment in a list of indexes ,
e.g Index = [1, 100, 235, 500].
I have defined the amino acids I want in those positions also.
Res1 = ["A","G"]
Res2 = ["T","F"]
Res3 = ["S,"W"]
Res4 = ["H","J"]
I am currently doing something like this :
for m in records_dict:
if (records_dict[m].seq[Index[0]] \
in Res1) and (records_dict[m].seq[Index[1]] \
in Res2) and (records_dict[m].seq[Index[2]] \
in Res3) and (records_dict[m].seq[Index[3]]\
in Res4)
print m
Now, suppose I have a list of 40 residues I want to check, I know I have to write the lists of residues to check manually, but surely, there is an easier way to do this membership check using a while loop or something else.
Also, is there any way I could incorporate a system where if no sequences match all the 40 membership checks, I would get the 5 best sequences that are closest to matching all 40 checks, and an output such as sequence "m" has 30/40 matches and a list of what those 30 matches are, and which 10 didn't match?
Upvotes: 0
Views: 902
Reputation: 5140
I will assume you want to check if Res1
is at Index[0]
, Res2
at Index[1]
and so on.
res = [Res1, Res2, Res3, Res4]
for m in records_dist:
match = 0
match_log = []
for i in Index:
if records_dict[m].seq[i] in res[i]:
match += 1
match_log.append(i)
With this little code, you're able to compute the number of match, and track the indexes where match occurs for each records_dist
values.
If you want to check if ResX
is at multiple position, or if you don't want to order the indexes list like the Res list, I'd define a dictionary of list where keys are ResX
and values are list of indexes:
to_check = {}
to_check[Res1] = [index1, index2]
to_check[Res2] = [index1, ..., indexN]
...
to_check[ResX] = [indexI, ..., indexJ]
then, using
match_log = {}
for m in records_dist:
match_log[m] = {}
for res, indexes in to_check:
match_log[m][res] = []
for i in indexes:
if records_dict[m].seq[i] in res:
match_log[m][res].append(i)
nb_match = len(match_log[m][res])
or in a more pythonic way, using filter
:
match_log = {}
for m in records_dist:
match_log[m] = {}
for res, indexes in to_check:
match_log[m][res] = filter(lamba i: records_dict[m].seq[i] in res, indexes)
nb_match = len(match_log[m][res])
Upvotes: 1