Reputation: 39

Comparing two strings using known algorithms

I'm trying to compare two strings (product names) using some of well known algorithms like Levenstein distance and library of different solutions for string simmetrics (got best results with SmithWatermanGotoh alg).

Two strings are:

iPhone 3gs 32 GB black

Apple iPhone 3 gs 16GB black

Levenstein is working pretty bad on whole string if some words are in different order (which is expected from how algorithm works) so I tried to implement word by word comparison.

The problem I'm facing with is the way to detect similar 'words' that are divided with space char ('3gs'->'3 gs' ; '32 GB'->'16GB').

My code compares shorter (word count, if == then str.length) string with longer one. Words are split into ArrayList<String>. I'm combining each word from str1 with others in the same string creating new arraylist.

Here is a rough code:

foreach(str1)

    foreach(str2)
        res1 = getLevensteinDist
    endforeach

    foreach(combinedstr2)
        res1 = getLevensteinDist
    endforeach      

    return getHigherPercent(res1, res2)

 endforeach

This works if the words in str2 are split, but I can't figure out how to do a reverse, detect words in str2 that are split in str1.

I hope I'm at least a bit clear what I'm trying to do. Every help is appreciated.

Upvotes: 0

Comparing two strings using known algorithms

Answers (4)

Related Questions