Reputation: 6624
How can I go about querying a database for items that are not only exactly similar to a sample, but also those that are almost similar? Almost as search engines work, but only for a small project, preferably in Java. For example:
String sample = "Sample";
I would like to retrieve all the following whenever I query sample
:
String exactMatch = "Sample";
String nonExactMatch = "S amp le";
String nonExactMatch_2 = "ampls";
Upvotes: 0
Views: 36
Reputation: 57388
You need to define what similar means in terms that your database can understand.
Some possibilities include Levenshtein distance, for example.
In your example, sample matches...
..."Sample", if you search without case sensitivity.
..."S amp le", if you remove a set of ignored characters (here space only) from both the query string and the target string. You can store the new value in the database:
ActualValue SearchFor
John Q. Smith johnqsmith%
When someone searches for "John Q. Smith, Esq." you can boil it down to johnqsmithesq and run
WHERE 'johnqsmithesq' LIKE SearchFor
"ampls" is more tricky. Why is it that 'ampls' is matched by 'sample'? A common substring? A number of shared letters? Does their order count (i.e. are anagrams valid)? Many approaches are possible, but it is you who must decide. You might use Levenshtein distance, or maybe store a string such as "100020010003..." where every digit encodes the number of letters you have, up to 9 (so 3 C's and 2 B's but no A's would give "023...") and then run the Levenshtein distance between this syndrome and the one from each term in the DB:
ActualValue Search1 Rhymes abcdefghij_Contains anagramOf
John Q. Smith johnqsmith% ith 0000000211011... hhijmnoqst
...and so on.
One approach is to ask oneself, how must I transform both searched value and value searched for, so that they match?, and then proceed and implement that in code.
Upvotes: 2
Reputation: 18449
You can use match_against in myisam full text indexes columns.
Upvotes: 0