Reputation: 325
I wrote a method that helps to match names that represent the same person but are written in different ways (full name or short version), for example:
Paul Samuelson-Smith
and Paul Smith
would be considered equal based on my method:
private static boolean equalName(String name_2, String name_1){
boolean equality1 = true;
name_1 = name_1.replace("&", " ").replace("-", " ");
String [] names1 = name_1.split(" ");
for (int i = 0; i < names1.length ; i ++) {
if (!name_2.contains(names1[i])) {equality1 = false; break;}
}
boolean equality2 = true;
name_2 = name_2.replace("&", " ").replace("-", " ");
String [] names2 = name_2.split(" ");
for (int i = 0; i < names2.length ; i ++) {
if (!name_1.contains(names2[i])) {equality2 = false; break;}
}
return equality1 || equality2;
}
However I still have a problem with what if there is a typo in a name, say Paul Samuelson-Smith
and Paull Smith
are the same person. My question is is there any API that would help account for possible typos? How can I improve my method?
Upvotes: 4
Views: 188
Reputation: 28951
Algorithm you need is something that could not just return true/false. E.g. then you compare 'Paula Smith' and 'Paul Smith' and 'Paul Saumelson-Smith' you should choose the best match. Have a look here: http://www.katkovonline.com/2006/11/java-fuzzy-string-matching/ but it is better for classification, so if you need work on a large database and choose the best matches.
Upvotes: 1
Reputation: 2758
Here is a library that has a few distance algorithms built in: http://sourceforge.net/projects/simmetrics/
Upvotes: 4