Aleksei Nikolaevich
Aleksei Nikolaevich

Reputation: 325

How to spot almost the same strings?

I wrote a method that helps to match names that represent the same person but are written in different ways (full name or short version), for example:

Paul Samuelson-Smith and Paul Smith would be considered equal based on my method:

private static boolean equalName(String name_2, String name_1){
        boolean equality1 = true;
        name_1 = name_1.replace("&", " ").replace("-", " ");
        String  []  names1 = name_1.split(" ");
        for (int i = 0; i < names1.length ; i ++) {
            if (!name_2.contains(names1[i])) {equality1 = false; break;}
        }
        boolean equality2 = true;
        name_2 = name_2.replace("&", " ").replace("-", " ");
        String  []  names2 = name_2.split(" ");
        for (int i = 0; i < names2.length ; i ++) {
            if (!name_1.contains(names2[i])) {equality2 = false; break;}
        }
        return equality1 || equality2;
    }

However I still have a problem with what if there is a typo in a name, say Paul Samuelson-Smith and Paull Smith are the same person. My question is is there any API that would help account for possible typos? How can I improve my method?

Upvotes: 4

Views: 188

Answers (2)

kan
kan

Reputation: 28951

Algorithm you need is something that could not just return true/false. E.g. then you compare 'Paula Smith' and 'Paul Smith' and 'Paul Saumelson-Smith' you should choose the best match. Have a look here: http://www.katkovonline.com/2006/11/java-fuzzy-string-matching/ but it is better for classification, so if you need work on a large database and choose the best matches.

Upvotes: 1

Amir T
Amir T

Reputation: 2758

Possible duplicate

Here is a library that has a few distance algorithms built in: http://sourceforge.net/projects/simmetrics/

Upvotes: 4

Related Questions