Raam
Raam

Reputation: 10866

Identifying closest match to given string

My requirement is to be able to match two strings that are similar but not an exact match. For example, given the following strings

The output should be FirstName, FName and Last Name, LName as they are a logical match. Are there any libraries that I could use to do this? I am using JAVA to achieve this functionality.

Thanks Raam

Upvotes: 1

Views: 6512

Answers (5)

CupawnTae
CupawnTae

Reputation: 14580

You could use Apache Commons StringUtils...

http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#getLevenshteinDistance(java.lang.CharSequence,%20java.lang.CharSequence)

But it's worth noting that this may not be the best algorithm for the specific use-case in the question - I recommend reading some of the other answers here for more ideas.

Upvotes: 4

Ashish Shetkar
Ashish Shetkar

Reputation: 1467

StringUtils is simply best for this - this is one of the examples i found on stackOverflow - as @CupawnTae said already

Below is the one of the simple example i came across

public static Object getTheClosestMatch(Collection<?> collection, Object target) {
    int distance = Integer.MAX_VALUE;
    Object closest = null;
    for (Object compareObject : collection) {
        int currentDistance = StringUtils.getLevenshteinDistance(compareObject.toString(), target.toString());
        if(currentDistance < distance) {
            distance = currentDistance;
            closest = compareObject;
        }
    }
    return closest;
}

Upvotes: 2

user2566092
user2566092

Reputation: 4661

According to the example you gave, you should use a modified Levenshtein distance where the penalty for adding spaces is small and the penalty for mismatched characters is larger. This will handle matching abbreviations to the strings that were abbreviated quite well. However that's assuming that you're mainly dealing with aligning abbreviations to corresponding longer versions of the strings. You should elaborate more exactly what kind of matchings you want to perform (e.g. more examples, or some kind of high-level description) if you want a more detailed and pointed answer about what methods you can/should use.

Upvotes: 2

wckd
wckd

Reputation: 410

The spell check algorithms use a variant of this algorithm. http://en.wikipedia.org/wiki/Levenshtein_distance. I implemented it in class for a project and it was fairly simple to do so. If you don't want to implement it yourself you can use the name to search for other libraries.

Upvotes: 1

joe cool
joe cool

Reputation: 55

An answer to a really similar question to yours can be found here.

Also, wikipedia has an article on Approximate String Matching that can be found here. If the first link isn't what you're looking for, I would suggest reading the wikipedia article and digging through the sources to find what you need.

Sorry I can't personally be of more help to you, but I really hope that these resources can help you find what you're looking for!

Upvotes: 1

Related Questions