Brandon
Brandon

Reputation: 745

Solr Lucene XmlQueryParser SpanNot not excluding Exclude

How do I get this standard lucene query in XmlQueryParser syntax?

headline:(new -york)

Here's what I have so far:

{!xmlparser}
<SpanNot fieldName="headline">
  <Include>
    <SpanTerm>new</SpanTerm>
  </Include>
  <Exclude fieldName="headline">
    <SpanTerm>york</SpanTerm>
  </Exclude>
</SpanNot>

I originally didn't include fieldName="headline" for the Exclude node, but I added it when I kept getting "york" in the headlines.

Here are some of the results that are coming through:

{"id":243832340000000092, "headline":"New look pour New York"},
{"id":243661152000000019, "headline":"New York/New Market Project"},
{"id":243959040000000448, "headline":"New York Backs New Transmission Lines"}

Here's some of the debug output in the response:

"rawquerystring":"{!xmlparser}\n<SpanNot fieldName=\"headline\">\n  <Include>\n\t<SpanTerm>new</SpanTerm>\n  </Include>\n  <Exclude fieldName=\"headline\">\n\t<SpanTerm>york</SpanTerm>\n  </Exclude>\n</SpanNot>",

"querystring":"{!xmlparser}\n<SpanNot fieldName=\"headline\">\n  <Include>\n\t<SpanTerm>new</SpanTerm>\n  </Include>\n  <Exclude fieldName=\"headline\">\n\t<SpanTerm>york</SpanTerm>\n  </Exclude>\n</SpanNot>",

"parsedquery":"SpanBoostQuery(spanNot(headline:new^1.0, headline:york^1.0, 0, 0)^1.0)",

"parsedquery_toString":"spanNot(headline:new^1.0, headline:york^1.0, 0, 0)^1.0",
        "QParser":"XmlQParser"

The question is why am I getting New York in my results?

Upvotes: 0

Views: 105

Answers (1)

femtoRgon
femtoRgon

Reputation: 33351

Your query is looking for spans (some fragment of the field) that contains "new", but do not contain "york". Simply having the term "new" is good enough to provide that. Usually, this would be used with a SpanNear or something similar, which makes it more useful. For instance, if you have a SpanNear for the terms "new" and "term2" in your Include, intead, you could match "new other stuff term2 york etc" because the instance of "york" falls outside span matched in the include, but "new york term2" would not be matched, because "york" falls within the SpanNear.

SpanNot actually does have constructor arguments you could use to check within a certain distance outside of the Include span. I'm not sure whether this is supported in the xmlparser (I'm not that familiar with it), but if so, I would imagine something like this:

{!xmlparser}
<SpanNot fieldName="headline">
  <Include>
    <SpanTerm>new</SpanTerm>
  </Include>
  <Exclude fieldName="headline">
    <SpanTerm>york</SpanTerm>
  </Exclude>
  <Pre>0</Pre>
  <Post>1</Post>
</SpanNot>

Upvotes: 1

Related Questions