cosmictypist
cosmictypist

Reputation: 575

Calculating closest string match from a list of strings

I'm trying to find a way to calculate/determine the closest string match from a list of strings.

Here is the string that I want to find the closest match to: CTGGAG

From a list of strings:

matchlist = ['ACTGGA', 'CTGGAG', 'CTGGAA', 'CTGGTG', 'ACCGGT']

I've tried using the SequenceMatcher from difflib:

for t in match:
    assignseqmatch = SequenceMatcher(None, CTGGAG, t)
    ratio = assignseqmatch.ratio()
    seqratiomatchlist.append(ratio)
    for r, s in zip(seqratiomatchlist, neutralhex):
        neutralmatchscores[r].append(s)

However, when I use this method, the first four values in the list are all reported to have the same ratio (0.833333) when the third and fourth values in the list should have the highest ratio since there is only a one letter difference between CTGGAG, CTGGAA, and CTGGTG. I basically just want to calculate how many letter changes there are between the two strings. Is this possible?

Upvotes: 1

Views: 1961

Answers (1)

zondo
zondo

Reputation: 20336

To find the number of letter changes between two equal-length strings, x and y, do the following:

numChanges = sum(i != j for i, j in zip(x, y))

Upvotes: 2

Related Questions