Reputation: 8320
TARGET
be a set of strings that I expect to be spoken.SOURCE
be the set of strings returned by a speech recognizer (that is, the possible sentences that it has heard).I need a way to choose a string from TARGET
. I read about the Levenshtein distance and the Damerau-Levenshtein distance, which basically returns the distance between a source string and a target string, that is the number of changes needed to transform the source string into the target string.
But, how can I apply this algorithm to a set of target strings?
I thought I'd use the following method:
TARGET
, I calculate the distance from each string in SOURCE
. In this way we obtain an m-by-n matrix, where n is the cardinality of SOURCE
and n is the cardinality of TARGET
. We could say that the i-th row represents the similarity of the sentences detected by the speech recognizer with respect to the i-th target.average_on_row(i)
, where i
is the row index.stadard_deviation_sum(i)
) refers to a string of TARGET
.The string which is associated with the shortest stadard_deviation_sum
could be the sentence pronounced by the user. Could be considered the correct method I used? Or are there other methods?
Obviously, too high values indicate that the sentence pronounced by the user probably does not belong to TARGET
.
Upvotes: 3
Views: 1984
Reputation: 7817
You need to calculate these probabilities first: probability of insertion, deletion and substitution. Then use log of these probabilities as penalties for each operation.
In a "context independent" situation, if pi is probability of insertion, pd is probability of deletion and ps probability of substitution, the probability of observing the same symbol is pp=1-ps-pd.
In this case use log(pi/pp/k), log(pd/pp) and log(ps/pp/(k-1)) as penalties for insertion, deletion and substitution respectively, where k is the number of symbols in the system.
Essentially if you use this distance measure between a source and target you get log probability of observing that target given the source. If you have a bunch of training data (i.e. source-target pairs) choose some initial estimates for these probabilities, align source-target pairs and re-estimate these probabilities (AKA EM strategy).
You can start with one set of probabilities and assume context independence. Later you can assume some kind of clustering among the contexts (eg. assume there are k different sets of letters whose substitution rate is different...).
Upvotes: 1
Reputation: 2579
I'm not an expert but your proposal does not make sense. First of all, in practice I'd expect the cardinality of TARGET to be very large if not infinite. Second, I don't believe the Levensthein distance or some similar similarity metric will be useful.
If :
then I believe your best bet would be to find the pair p in SOURCE, q in TARGET such that distance(p,q) is minimum. Since especially you cannot guarantee the equal-probability part, I think you should think about the problem from scratch, do some research and make a completely different design. The usual methodology for speech recognition is the use Hidden Markov models. I would start from there.
Answer to your comment: Choose whichever is more probable. If you don't consider probabilities, it is hopeless.
[Suppose the following example is on phonemes, not characters]
Suppose the recognized word the "chees". Target set is "cheese", "chess". You must calculate P(cheese|chees) and P(chess|chees) What I'm trying to say is that not every substitution is equiprobable. If you will model probabilities as distances between strings, then at least you must allow that for example d("c","s") < d("c","q") . (It is common to confuse c and s letters but it is not common to confuse c and q) Adapting the distance calculation algorithm is easy, coming with good values for all pairs is difficult.
Also you must somehow estimate P(cheese|context) and P(chess|context) If we are talking about board games chess is more probable. If we are talking about dairy products cheese is more probable. This is why you'll need large amounts of data to come up with such estimates. This is also why Hidden Markov Models are good for this kind of problem.
Upvotes: 4