Reputation: 13
I am doing a medical study and the following algorithm, so simple to humans, I found hard to implement.
The situation:
We have an array of increasing natural numbers. A reader tries to read out loud those numbers in order. They might say a wrong number or skip a number. The last number will always be correct. The reader's response will be another array. My goal is to figure out how many mistakes the reader makes given the two arrays.
For example: given
[1,2,3,4,5,6]
the reader responded
[1,2,3,4,4,6]
One mistake (reading 4 on the 5 slot).
the reader responded [1,2,3,5,6]
. One mistake (one skip)
the reader responded [1,2,3,3,6]
. Two mistakes (1 skip and 1 wrong)
I have tried to use adjacency match but it fails easily when the reader says the wrong number and skips a number of times consecutively.
How would you implement such an algorithm? As suggested in the comments, Levenshtein distance with some alteration seems to solve this problem. Damerau_Levenshtein is not necessary because the reader is unlikely to swap answers.
Upvotes: 1
Views: 164
Reputation: 2416
As pointed by others , this problem can be solved in ways similar to standard 'Edit Distance' problem in DP method. Wiki reference of the problem will give more details.
The changes needed to suite your need will be to allow only 2 operations out of 3 operations (Insert, Delete, Substitute).
Operations mapped for your problem:
A python implementation below should be a starting point.
def get_mistakes(s1, s2):
m = [[0 for j in range(len(s2)+1)] for i in range(len(s1)+1)]
for i in range(len(s1)+1):
for j in range(len(s2)+1):
if i == 0:
m[i][j] = sum(bytearray(s2[:j]))
elif j == 0:
m[i][j] = sum(bytearray(s1[:i]))
elif s1[i-1] == s2[j-1]:
m[i][j] = m[i-1][j-1]
else:
skip = 1
substitute = 1
m[i][j] = min(m[i-1][j-1] + substitute, m[i][j-1] + skip)
return m[len(s1)][len(s2)]
Hope it Helps!
Upvotes: 1