Yellowfog
Yellowfog

Reputation: 2343

Solr: matching on phrases contained in the query string

I've just started trying to use Solr, and already I think that I'm attempting to use it backwards. Could someone let me know if what I'm trying to do is possible?

In normal use, one might specify a phrase and then search stored documents for instances of that phrase. However, I have a list of stored phrases and I'm trying to determine which of those phrases my query string contains.

For instance: suppose that I have phrases like these stored in Solr:

1:"fish fingers" 
2:"apple pie"

If my search term is "I like fish fingers" then I want Solr to return the first record. If it's "I like fish fingers and apple pie" then I want it to return both records. But if it's "I like apple fingers and fish pie" then I want it to return no records.

(Of course, if the phrases were always two words then it would be pretty simple to do this by constructing a disjunctive query with all the two word phrases. But the phrases can potentially be any length.).

Thanks for any help.

Upvotes: 4

Views: 2925

Answers (3)

Yellowfog
Yellowfog

Reputation: 2343

I decided to read through the documentation on each Filter and Tokenizer, which is where I came across this description of the PositionFilterFactory:

Another example is when exact matching hits 
are wanted for _any_ shingle within the query

The configuration given on this page is nearly exactly what I want. Unfortunately, since there doesn't seem to be a filter which glues terms split by the tokenizer back into a single token, I can't do any stemming. But maybe I can knock up such a filter myself.

Upvotes: 2

Yuval F
Yuval F

Reputation: 20621

I believe shingles - token n-grams used for matching - could be a start in solving your problem.

Check out ShingleFilterFactory and ShingleFilter.

Upvotes: 2

Jayendra
Jayendra

Reputation: 52779

This seems to be the same functionality as keymatch search provided by google search appliance, where it tries to match the indexed terms to queries rather than the other way around. And we too had to implement a custom solution.

You would probably need to implement your own query parser for the same.
And also as you already mentioned, probably thats the only solution you have.

  • Generate combinations of the search terms e.g. I like fish fingers -> i like, like fish, fish fingers, i like fish, like fish fingers, i like fish fingers
  • Create a Disjunction max query with the phrase matches with all the above combinations with the should boolean clause, which would cause it match any of the phrase matches.
  • However, this would not provide an exact match as well.
  • There is one more caveat, as if the stored terms is like "nice fish fingers", the query for "i like fish fingers" would still match the record. So you may need to check (can provide the workaround we used).

Upvotes: 1

Related Questions