Reputation: 420
My problem is
I have full names with concatenated names, like "davidrobert jones". I want to split it to be "david robert jones".
I tested the solutions using longest prefix matching algorithm with a names dictionary, but it's not that simple because a name could be written in many ways. I added phonetic matching algorithm too, but also there are many names that could have same pronunciation and so they're very ambiguous.
What is the best solution to do so?, i believe machine learning could have an answer, but i don't know much about machine learning.
Upvotes: 0
Views: 238
Reputation: 4318
One possible algorithmic solution is to create a longer compositional dictionary representing all possible first_name last_name. Then for any given list of tokens as a name (words separated with space), for each token, find all dictionary enteries which have shortest edit distance to that token
Upvotes: 0
Reputation: 9937
I think your problem is similar to Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names. In section 5 this article has python method for Named Entity Recognition
.
Upvotes: 1