Reputation: 151
I am trying to mapping users from different systems based on user first and last name in Python.
One issue is that the first names are in many cases 'nicknames.' For example, for a user, his first name is 'Dave' in one system, and 'David' in another.
Is there any easy way in python to convert common nicknames like these to their formal counterparts?
Thanks!
Upvotes: 4
Views: 4577
Reputation: 4084
In [1]: first_name_dict = {'David':['Dave']}
In [2]: def get_real_first_name(name):
...: for first_name in first_name_dict:
...: if first_name == name:
...: return name
...: elif name in first_name_dict[first_name]:
...: return first_name
...: else:
...: return name
...:
In [3]: get_real_first_name('David')
Out[3]: 'David'
In [4]: get_real_first_name('Dave')
Out[4]: 'David'
I'm using Ipython. Basically you need a dictionary to do that. The first_name_dict is your first name dictionary. For example, David can be called as "Dave" or "Davy", and Lucas can be called as "Luke", then you can write the dictionary like:
first_name_dict = {'David' : ['Dave', 'Davy'], 'Lucas' : ['Luke']}
You can improve the solution by adding "case-insensitive" matching
Upvotes: 0
Reputation: 17072
Not within Python specifically, but try using this:
http://deron.meranda.us/data/nicknames.txt
If you load that data into python (csv.reader(<FileObject>, delimiter='\t')
), you can then do a weighted probability-type function to return a full name for the nicknames in that list.
You could do something like this:
import collections
def weighted_choice_sub(weights):
# Source for this function:
# http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/
rnd = random.random() * sum(weights)
for i, w in enumerate(weights):
rnd -= w
if rnd < 0:
return i
def load_names():
with open(<filename>, 'r') as infile:
outdict = collections.defaultdict(list)
for line in infile.readlines():
tmp = line.strip().split('\t')
outdict[tmp[0]].append((tmp[1], float(tmp[2])))
return outdict
def full_name(nickname):
names = load_names()
return names[nickname][weighted_choice_sub([x[1] for x in names[nickname]])][0]
Upvotes: 5
Reputation: 3830
You'd have to create a database or hash mapping nicknames onto formal names. If you can find such a list online, the process of implementing the map will be trivial. The real fun will be getting a complete enough list, ensuring variations are taken care of, and making sure you don't run into problems when people's formal names ARE their nicknames. Not everyone who goes by Dave has a formal name of David for example. The person's formal name may very well be Dave.
Upvotes: 0