Dixit Singla
Dixit Singla

Reputation: 2620

MarkLogic unfiltered diacritic search is not working as expected

I am using ML 8.

unfiltered cts:search is not working as expected for diacritic cts:query.

Find the below example.

I have inserted the below xml in ML DB with uri /diacritic/a.xml

<root>
    <name>öily</name>
</root>

cts:search query which I am running.

cts:search(
    doc('/diacritic/a.xml'),
    cts:element-value-query(xs:QName('name'), 'oily', ('diacritic-sensitive')),
    'unfiltered'
)

Above cts:query is returning me the above doc but it should not return as diacritic-sensitive option is present.

For filtered search it is working fine.

Note: fast diacritic sensitive searches is set to true

Please help.

Upvotes: 0

Views: 59

Answers (1)

mholstege
mholstege

Reputation: 4912

Collations are irrelevant for search (except for range queries).

The issue here is how indexing works. The keys do not know that they are diacritic-insensitive keys, they only know what the characters in the key are. To compute the diacritic-insensitive key for a word, we remove diacritics and form a key from what is left. To compute the diacritic-sensitive key for a word, we don't remove the diacritics and form a key from the word with them intact.

In this case diacritic-insensitive-key(oily) = diacritic-sensitive-ke(oily) = diacritic-insensitive-key(öily) != diacritic-sensitive-key(öily).

So the index can't resolve the difference here. A diacritic-sensitive search for öily would not match oily in the index, but the reverse is not true.

To get an accurate result here, you'll need to filter.

Addendum: why don't we include the diacritic-sensitivity into the key? Because that would drastically increase the size of the index (x2 for diacritics, x2 more for case).

Upvotes: 1

Related Questions