Reputation: 745
How do I get this standard lucene query in XmlQueryParser syntax?
headline:(new -york)
Here's what I have so far:
{!xmlparser}
<SpanNot fieldName="headline">
<Include>
<SpanTerm>new</SpanTerm>
</Include>
<Exclude fieldName="headline">
<SpanTerm>york</SpanTerm>
</Exclude>
</SpanNot>
I originally didn't include fieldName="headline" for the Exclude node, but I added it when I kept getting "york" in the headlines.
Here are some of the results that are coming through:
{"id":243832340000000092, "headline":"New look pour New York"},
{"id":243661152000000019, "headline":"New York/New Market Project"},
{"id":243959040000000448, "headline":"New York Backs New Transmission Lines"}
Here's some of the debug output in the response:
"rawquerystring":"{!xmlparser}\n<SpanNot fieldName=\"headline\">\n <Include>\n\t<SpanTerm>new</SpanTerm>\n </Include>\n <Exclude fieldName=\"headline\">\n\t<SpanTerm>york</SpanTerm>\n </Exclude>\n</SpanNot>",
"querystring":"{!xmlparser}\n<SpanNot fieldName=\"headline\">\n <Include>\n\t<SpanTerm>new</SpanTerm>\n </Include>\n <Exclude fieldName=\"headline\">\n\t<SpanTerm>york</SpanTerm>\n </Exclude>\n</SpanNot>",
"parsedquery":"SpanBoostQuery(spanNot(headline:new^1.0, headline:york^1.0, 0, 0)^1.0)",
"parsedquery_toString":"spanNot(headline:new^1.0, headline:york^1.0, 0, 0)^1.0",
"QParser":"XmlQParser"
The question is why am I getting New York in my results?
Upvotes: 0
Views: 105
Reputation: 33351
Your query is looking for spans (some fragment of the field) that contains "new", but do not contain "york". Simply having the term "new" is good enough to provide that. Usually, this would be used with a SpanNear or something similar, which makes it more useful. For instance, if you have a SpanNear for the terms "new" and "term2" in your Include, intead, you could match "new other stuff term2 york etc" because the instance of "york" falls outside span matched in the include, but "new york term2" would not be matched, because "york" falls within the SpanNear.
SpanNot
actually does have constructor arguments you could use to check within a certain distance outside of the Include span. I'm not sure whether this is supported in the xmlparser (I'm not that familiar with it), but if so, I would imagine something like this:
{!xmlparser}
<SpanNot fieldName="headline">
<Include>
<SpanTerm>new</SpanTerm>
</Include>
<Exclude fieldName="headline">
<SpanTerm>york</SpanTerm>
</Exclude>
<Pre>0</Pre>
<Post>1</Post>
</SpanNot>
Upvotes: 1