Reputation: 575
I'm trying to find a way to calculate/determine the closest string match from a list of strings.
Here is the string that I want to find the closest match to:
CTGGAG
From a list of strings:
matchlist = ['ACTGGA', 'CTGGAG', 'CTGGAA', 'CTGGTG', 'ACCGGT']
I've tried using the SequenceMatcher from difflib:
for t in match:
assignseqmatch = SequenceMatcher(None, CTGGAG, t)
ratio = assignseqmatch.ratio()
seqratiomatchlist.append(ratio)
for r, s in zip(seqratiomatchlist, neutralhex):
neutralmatchscores[r].append(s)
However, when I use this method, the first four values in the list are all reported to have the same ratio (0.833333) when the third and fourth values in the list should have the highest ratio since there is only a one letter difference between CTGGAG
, CTGGAA
, and CTGGTG
. I basically just want to calculate how many letter changes there are between the two strings. Is this possible?
Upvotes: 1
Views: 1961
Reputation: 20336
To find the number of letter changes between two equal-length strings, x
and y
, do the following:
numChanges = sum(i != j for i, j in zip(x, y))
Upvotes: 2