Reputation: 110267
Assuming western naming convention of FirstName MiddleName(s) LastName
,
What would be the best way to correctly parse out the last name from a full name?
For example:
John Smith --> 'Smith'
John Maxwell Smith --> 'Smith'
John Smith Jr --> 'Smith Jr'
John van Damme --> 'van Damme'
John Smith, IV --> 'Smith, IV'
John Mark Del La Hoya --> 'Del La Hoya'
...and the countless other permutations from this.
Upvotes: 4
Views: 1848
Reputation: 697
Came across a lib called "nameparser" at https://pypi.python.org/pypi/nameparser It handles four out of six cases above:
#!/usr/bin/env python
from nameparser import HumanName
def get_lname(somename):
name = HumanName(somename)
return name.last
people_names = [
('John Smith', 'Smith'),
('John Maxwell Smith', 'Smith'),
# ('John Smith Jr', 'Smith Jr'),
('John van Damme', 'van Damme'),
# ('John Smith, IV', 'Smith, IV'),
('John Mark Del La Hoya', 'Del La Hoya')
]
for name, target in people_names:
print('{} --> {} <-- {}'.format(name, get_lname(name), target))
assert get_lname(name) == target
Upvotes: 1
Reputation:
I'm seconding Tnekutippa here, but you should check out named entity recognition. It might help automate some of the process. This is however, as noted, quite difficult. I'm not quite sure if the Stanford NER can extract first and last names out of the box, but a machine learning approach could prove very useful for this task. The Stanford NER could be a nice starting point, or you could try to make your own classifiers and training corpora.
Upvotes: 0
Reputation: 1695
Probably the best answer here is not to try. Names are individual and idosyncratic and, even limiting yourself to the Western tradition, you can never be sure that you'll have thought of all the edge cases. A friend of mine legally changed his name to be a single word, and he's had a hell of a time dealing with various institutions whose procedures can't deal with this. You're in a unique position of being the one creating the software that implements a procedure, and so you have an opportunity to design something that isn't going to annoy the crap out of people with unconventional names. Think about why you need to be parsing out the last name to begin with, and see if there's something else you could do.
That being said, as a purely techincal matter the best way would probably be to trim off specifically the strings " Jr", ", Jr", ", Jr.", "III", ", III", etc. from the end of the string containing the name, and then get everything from the last space in the string to the (new, after having removed Jr, etc.) end. This wouldn't get, say, "Del La Hoya" from your example, but you can't even really count on a human to get that - I'm making an educated guess that John Mark Del La Hoya's last name is "Del La Hoya" and not "Mark Del La Hoya" because I"m a native English speaker and I have some intuition about what Spanish last names look like - if the name were, say "Gauthip Yeidze Ka Illunyepsi" I would have absolutely no idea whether to count that Ka as part of the last name or not because I have no idea what language that's from.
Upvotes: 17