solr dismax phrase search

Question

I'm building an application which uses solr to match longer queries (typically, complete sentences) against indexed documents which are almost always shorter (search terms). So, my query looks like "should I buy a house now while the rates are low. We filed BR 2 yrs ago. Rent now, w/ some sch loan debt" and my indexed documents are like "buy a house", "house loan rates".

I thought that the right way to do this would be to use shingles, the dismax parser, and highly boosted "pf" field. So, I have a "normal" text field, kw_stopped (text_en in solr 3.4) with a very aggressive stopword list, and a kw_phrases field which is meant to be the phrase shingles. Its definition looks like this:

and my schema fields look like this:

My search handler config is this:


  
  edismax
  explicit
  0.1
  
    keywords
  
  1
  
    kw_stopped^1.0 kw_phrases^5.0
  
  
    kw_phrases^50.0
  
  3
  3
  *:*

When I turn on debugQuery, I notice that the "kw_phrases" is never matched unless the query and the document are exactly the same. Also the parsedquery shows that the each of the tokenized from the query appear as single DisjunctionMaxQuery clauses for "kw_stopped", but all shingles are put in one giant clause for the kw_phrases field.

Where is the gap in my understanding? How can I make this work?

thanks! Vijay

solr dismax phrase search

Answers (1)

Related Questions