convert similar sound word parts

Question

I'm having trouble searching for the right terms here to solve the below problem; I'm sure it's a done thing, I just can't find the right terms to express the problem!

I'm basically trying to create a classifier that will take word comparison outputs (e.g. some outputs from Levenstein distances) and decide whether the words are sufficiently different. An important input would probably be something like a soundex comparison. The trouble I'm having is creating the training set for the algorithm (an SVM in this case). I have a long list of names and I need to mutate them a bit (based on similar sounds within the word).

E.g. John and Jon would be a mutation to make, and I could label this in the test set as being equivalent. John and Johann have sufficiently different sound and letter distance to be considered different.

So I'm kinda asking for is a way to achieve a phoneme variation generator, but need to be able to retain the English lettering structure.

Even simple translation might suffice, like "f" could (sometimes) be replaced by "ph". I'm doing this in Java so any tips in that direction would be great too! Thanks.

EDIT

This is the closest I've come across so far: http://www.isi.edu/natural-language/people/hovy/papers/07IJCAI-spelling-variants.pdf

convert similar sound word parts

Answers (1)

Related Questions