Reputation: 2192
I just wanted to create a unfiltered whitespace-sensitive
cts query and somehow can not get it working (without other constraints).
This is my testing environment:
xquery version "1.0-ml";
xdmp:document-insert("test1.xml", <test><title>test word</title></test>);
xdmp:document-insert("test2.xml", <test><title>test-word</title></test>);
cts:search(//test, cts:element-value-query(xs:QName("title"), "test word", ("whitespace-sensitive")), ("unfiltered"))
I have two documents which only differ in a single character -
. Executing this cts search returns both documents. Looking at the execution-plan things get strange. The final-plan shows this:
<qry:term-query weight="1">
<qry:key>5029803220044614354</qry:key>
<qry:annotation>element(title,value("test","word"))</qry:annotation>
</qry:term-query>
MarkLogic seems to search for the two words test
and word
without a whitespace. It does not seem to use the option whitespace-sensitive
. Only if I add three more options "case-sensitive", "diacritic-sensitive" and "punctuation-sensitive" it does the actual whitespace-sensitive search. Removing any of the options results in a whitespace-insensitive search:
xdmp:plan(cts:search(//test, cts:element-value-query(xs:QName("title"), "test word", ("case-sensitive", "diacritic-sensitive", "punctuation-sensitive", "whitespace-sensitive")), ("unfiltered")))
=> ...
<qry:term-query weight="1">
<qry:key>11298961959398038325</qry:key>
<qry:annotation>element(title,value("test"," ","word"))</qry:annotation>
</qry:term-query>
Am I misunderstanding the option "whitespace-sensitive"?
Using MarkLogic 9.8-0.
Upvotes: 1
Views: 220
Reputation: 4912
I think perhaps you are expecting all options to be resolvable unfiltered. That is not the case. Some options and combinations of options and index settings cannot be resolved without filtering. In general, the only circumstance in which whitespace-sensitive queries can be resolved from the index is if the query is an "exact" value query. In general white space (and punctuation) are not indexed. This is what the plan is showing you. Since the information is not available in the index, an unfiltered query would not be able to exclude results on that basis. The filter, which has the actual data as well, can exclude results based on the white space and return correct results.
Upvotes: 3