Mohamed Seif
Mohamed Seif

Reputation: 420

An algorithm to split concatenated names

My problem is

I have full names with concatenated names, like "davidrobert jones". I want to split it to be "david robert jones".

I tested the solutions using longest prefix matching algorithm with a names dictionary, but it's not that simple because a name could be written in many ways. I added phonetic matching algorithm too, but also there are many names that could have same pronunciation and so they're very ambiguous.

What is the best solution to do so?, i believe machine learning could have an answer, but i don't know much about machine learning.

Upvotes: 0

Views: 238

Answers (2)

Mehdi
Mehdi

Reputation: 4318

One possible algorithmic solution is to create a longer compositional dictionary representing all possible first_name last_name. Then for any given list of tokens as a name (words separated with space), for each token, find all dictionary enteries which have shortest edit distance to that token

Upvotes: 0

Ali Soltani
Ali Soltani

Reputation: 9937

I think your problem is similar to Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names. In section 5 this article has python method for Named Entity Recognition.

Upvotes: 1

Related Questions