IcedDante
IcedDante

Reputation: 6832

Amazon CloudSearch matching long strings against domain documents

I'm implementing Amazon's Cloud Search API and was wondering how well it would work for "vague" queries.

We basically have records that contain descriptions. We want to find matches based on the content of that description. For example, our domain dataset has the following strings (where each string is a different document):

"The sun is shining bright today"

"The moon is shining in the sky tonight"

"The rain is pouring outside today"

If I were to submit a description to the server like this:

"The sun and moon are shining bright lately" 

is there a search method that would return a match for the first two elements (albeit with a low score)? There are key words that are important, ignoring the "the" and "is" type of words. If so, how is that search constructed?

Upvotes: 0

Views: 109

Answers (1)

alexroussos
alexroussos

Reputation: 2681

I was eventually able to get those strings to be returned with a query based on "The sun and moon are shining bright lately". I accomplished this by boolean OR'ing the terms together like this:

(or name:'sun' name:'moon' name:'shining' name:'bright' name:'lately')

I also removed stopwords but I don't think you need to.

It's definitely not pretty. The problem I had with other approaches was that CloudSearch seems pretty heavy-handed about penalizing results that don't contain a word from your query, so a word like lately in the query would cause it to not match any of the test strings. I was hoping to fix that with a rank expression but I think you can only rank results, not docs that didn't even match your query.

I also played around with sloppy phrase search but that still requires that the words are found some distance from each other, where in this case certain words aren't found at all.

The only other thing I can think to try is looking at the lucene and dismax query parsers. They won't change the underlying search engine but they may give you a different means of specifying a query that would work better.

Upvotes: 1

Related Questions