MisterIbbs
MisterIbbs

Reputation: 257

Intelligent String matching in Java possible?

I am currently coding a simple string matcher which checks if two strings are the same. I want to implement an intelligent way of doing this so that it can recognize when the majority of characters do match, with some space for error.

For example, a comparison between the words "program" and "prgoram" can be deemed a match, as it would intelligently allow typos. But the words "horse" and "esroh" would be detected as a mismatch.

Is there something in Java which I can readily use to achieve this, or is it a case of writing a custom method with a plethora of different checks?

Upvotes: 0

Views: 1991

Answers (2)

Jesmeen Hoque
Jesmeen Hoque

Reputation: 411

I know its old question. But this may help others.

Use "Levenshtein's Edit Distance as a Fuzzy String Match" Java has a Library in Apache Commons. But in-case you cant get the library or may need for other development purpose (Such as android), here is a Levenshtein.java code.. Details about Fuzzy String match

public class Levenshtein
{
public Levenshtein()
{
    super();
}

public double compare(final String s1, final String s2)
{
    double retval = 0.0;
    final int n = s1.length();
    final int m = s2.length();
    if (0 == n)
    {
        retval = m;
    }
    else if (0 == m)
    {
        retval = n;
    }
    else
    {
        retval = 1.0 - (compare(s1, n, s2, m) / (Math.max(n, m)));
    }
    return retval;
}

private double compare(final String s1, final int n, 
                       final String s2, final int m)
{
    int matrix[][] = new int[n + 1][m + 1];
    for (int i = 0; i <= n; i++)
    {
        matrix[i][0] = i;
    }
    for (int i = 0; i <= m; i++)
    {
        matrix[0][i] = i;
    }

    for (int i = 1; i <= n; i++)
    {
        int s1i = s1.codePointAt(i - 1);
        for (int j = 1; j <= m; j++)
        {
            int s2j = s2.codePointAt(j - 1);
            final int cost = s1i == s2j ? 0 : 1;
            matrix[i][j] = min3(matrix[i - 1][j] + 1, 
                                matrix[i][j - 1] + 1, 
                                matrix[i - 1][j - 1] + cost);
        }
    }
    return matrix[n][m];
}

private int min3(final int a, final int b, final int c)
{
    return Math.min(Math.min(a, b), c);
}
}

Just call from your main class and use the Double value for further work.

Levenshtein x=new Levenshtein();
Double n=x.compare("My nam s jesmeen", "my name");

Upvotes: 0

Jurgen Camilleri
Jurgen Camilleri

Reputation: 3589

You can use an algorithm like Levenshtein string distance to achieve this. The algorithm gives you the number of steps needed to change one string to another, so the less steps needed, the more similar the strings are.

As a readily available library I recommend StringUtils found in Apache Commons. You can check it out here.

Upvotes: 4

Related Questions