Austin
Austin

Reputation: 844

Regular Expression or Database for Check if String is Person's Name?

I have an application that reads XML information about a vehicle title and parses it into my application. In my database, I always store my names according to whether it is an individual's name or a company's name(because that can occur in my system). The trouble is that the XML source has name data, but it does not specify if it is an individual or a company. I need to know so I can store it appropriately in my database. Is there a database of names or a regular expression or a library that could check the string to see if it matches an individual's name? Thanks!

Upvotes: 4

Views: 6097

Answers (5)

Efrain Perez
Efrain Perez

Reputation: 63

I suggest using a machine learning algorithm. You can use supervised learning to train your model and have a probability if it is a first name or last name or even better if its a person. I suggest Naive Bayes algorithm. I recommend this approach because in my work we had that issue and I solved with machine learning.

You can use these datasets to train your model with names

https://mbejda.github.io/

and then you will have a very precise model to detect if a word is a person's name.

Recommend Python and scki-learn library.

Hope this help.

Please ask me if you have any problem

Best regards.

Upvotes: 0

t0mm13b
t0mm13b

Reputation: 34592

Well, names obviously have a first and last name broken up by a space, companies on the other hand would have Ltd (Limited), PLC (Public listed company) or LLC (a type of company listed under USA regulations)...am I going off the beaten track here? if the last_name and first_name is empty, check the company field, and vice versa...It seems you have put the combination of the two into the one field which makes it harder to do....

Upvotes: 0

Philip Schlump
Philip Schlump

Reputation: 3144

At a large telco that I used to work for we had this problem. We tested the following regular expression on 2 Million plus names

([A-Z][a-z][a-z]*)  *([A-Z][a-z]*)\.?  *([A-Z][a-z][a-z][a-z]*)

We got a 99.8% accuracy with this. The data was fairly clean. This was for a regular expression engine in C - so the syntax may be a little off from perl. I don't know if you will need the parenthesis.

Upvotes: 7

Brienne Schroth
Brienne Schroth

Reputation: 2457

No, there is no way to know. Are you dealing with Frank Zappa's child, Moon Unit, or are you dealing with Moon Unit, your number one source for real moon rock memorabilia? Names can be anything, company names can be anything (including the names of their owners!). The only way to know for sure which it is is if the data is supplied to you.

Upvotes: 5

Matthew Jones
Matthew Jones

Reputation: 26190

You are going to be hard-pressed to find one. Individual names, in particular, are often limited only by imagination. However, if you need one, may I suggest gathering a list of all car manufacturers that your application cares about, and check the XML name data against this list; if a match is found, obviously the name is a company, and if not, you can assume the name is an individual.

Upvotes: 1

Related Questions