Wagner Michael
Wagner Michael

Reputation: 2192

cts search whitespace-sensitive

I just wanted to create a unfiltered whitespace-sensitive cts query and somehow can not get it working (without other constraints).

This is my testing environment:

xquery version "1.0-ml";

xdmp:document-insert("test1.xml", <test><title>test word</title></test>);
xdmp:document-insert("test2.xml", <test><title>test-word</title></test>);


cts:search(//test, cts:element-value-query(xs:QName("title"), "test word", ("whitespace-sensitive")), ("unfiltered"))

I have two documents which only differ in a single character -. Executing this cts search returns both documents. Looking at the execution-plan things get strange. The final-plan shows this:

<qry:term-query weight="1">
 <qry:key>5029803220044614354</qry:key>
 <qry:annotation>element(title,value("test","word"))</qry:annotation>
</qry:term-query>

MarkLogic seems to search for the two words test and word without a whitespace. It does not seem to use the option whitespace-sensitive. Only if I add three more options "case-sensitive", "diacritic-sensitive" and "punctuation-sensitive" it does the actual whitespace-sensitive search. Removing any of the options results in a whitespace-insensitive search:

xdmp:plan(cts:search(//test, cts:element-value-query(xs:QName("title"), "test word", ("case-sensitive", "diacritic-sensitive", "punctuation-sensitive", "whitespace-sensitive")), ("unfiltered")))
=> ...
<qry:term-query weight="1">
 <qry:key>11298961959398038325</qry:key>
 <qry:annotation>element(title,value("test"," ","word"))</qry:annotation>
</qry:term-query>

Am I misunderstanding the option "whitespace-sensitive"?

Using MarkLogic 9.8-0.

Upvotes: 1

Views: 220

Answers (1)

mholstege
mholstege

Reputation: 4912

I think perhaps you are expecting all options to be resolvable unfiltered. That is not the case. Some options and combinations of options and index settings cannot be resolved without filtering. In general, the only circumstance in which whitespace-sensitive queries can be resolved from the index is if the query is an "exact" value query. In general white space (and punctuation) are not indexed. This is what the plan is showing you. Since the information is not available in the index, an unfiltered query would not be able to exclude results on that basis. The filter, which has the actual data as well, can exclude results based on the white space and return correct results.

Upvotes: 3

Related Questions