Program-Me-Rev
Program-Me-Rev

Reputation: 6624

How can I go about querying a database for non-similar, but almost matching items

How can I go about querying a database for items that are not only exactly similar to a sample, but also those that are almost similar? Almost as search engines work, but only for a small project, preferably in Java. For example:

String sample = "Sample";

I would like to retrieve all the following whenever I query sample:

String exactMatch = "Sample";
String nonExactMatch = "S   amp   le";
String nonExactMatch_2 = "ampls";

Upvotes: 0

Views: 36

Answers (2)

LSerni
LSerni

Reputation: 57388

You need to define what similar means in terms that your database can understand.

Some possibilities include Levenshtein distance, for example.

In your example, sample matches...

..."Sample", if you search without case sensitivity.

..."S amp le", if you remove a set of ignored characters (here space only) from both the query string and the target string. You can store the new value in the database:

   ActualValue      SearchFor
   John Q. Smith    johnqsmith%

When someone searches for "John Q. Smith, Esq." you can boil it down to johnqsmithesq and run

   WHERE 'johnqsmithesq' LIKE SearchFor

"ampls" is more tricky. Why is it that 'ampls' is matched by 'sample'? A common substring? A number of shared letters? Does their order count (i.e. are anagrams valid)? Many approaches are possible, but it is you who must decide. You might use Levenshtein distance, or maybe store a string such as "100020010003..." where every digit encodes the number of letters you have, up to 9 (so 3 C's and 2 B's but no A's would give "023...") and then run the Levenshtein distance between this syndrome and the one from each term in the DB:

 ActualValue    Search1     Rhymes  abcdefghij_Contains  anagramOf
 John Q. Smith  johnqsmith% ith     0000000211011...     hhijmnoqst

...and so on.

One approach is to ask oneself, how must I transform both searched value and value searched for, so that they match?, and then proceed and implement that in code.

Upvotes: 2

Aman Aggarwal
Aman Aggarwal

Reputation: 18449

You can use match_against in myisam full text indexes columns.

Upvotes: 0

Related Questions