Reputation: 2620
MarkLogic version: 8.0-6.3
Let me explain the issue with example.
Insert below docs in DB:
xdmp:document-insert('/sample/1.xml', <data>Türkiye Araştırmaları Literatür Dergisi</data>);
xdmp:document-insert('/sample/2.xml', <data>Türk-İslâm Medeniyeti Akademik Araştırmalar Dergisi/Journal of the Academic Studies of Turkish-Islamic Civilization</data>);
xdmp:document-insert('/sample/3.xml', <data>Österreich in Geschichte und Literatur (mit Geographie)</data>);
xdmp:document-insert('/sample/4.xml', <data>Uluslararası Karadeniz Havzası Halk Bilimi Araştırmaları Dergisi</data>);
xdmp:document-insert('/sample/5.xml', <data>Süleyman Demirel Üniversitesi Fen-Edebiyat Fakültesi Sosyal Bilimler Dergisi</data>);
xdmp:document-insert('/sample/6.xml', <data>Tarih İncelemeleri Dergisi</data>);
xdmp:document-insert('/sample/7.xml', <data>Literatur und Kritik</data>);
xdmp:document-insert('/sample/8.xml', <data>Cumhuriyet Tarihi Araştırmaları Dergisi</data>);
xdmp:document-insert('/sample/9.xml', <data>Divan Edebiyatı Araştırmaları Dergisi/The Journal of Ottoman Literature Studies</data>);
xdmp:document-insert('/sample/10.xml', <data>Krieg und Literatur/War and Literature</data>);
xdmp:document-insert('/sample/11.xml', <data>Trakya Üniversitesi Edebiyat Fakültesi Dergisi</data>);
xdmp:document-insert('/sample/12.xml', <data>Jahrbuch zur Kultur und Literatur der Weimarer Republik</data>);
cts query:
cts:search(
doc(),
cts:element-word-query(
xs:QName('data'),
"Türk?ye Arast?rmalar? L?teratür Derg?s?",
("case-insensitive","diacritic-insensitive","punctuation-insensitive","stemmed","wildcarded","lang=en")
),
'unfiltered'
)
Output:
Returned all the above inserted documents.
Expected output:
should return /sample/1.xml
doc alone.
Database Config:
<config>
<name>content</name>
<package-database-properties>
<enabled>true</enabled>
<retired-forest-count>0</retired-forest-count>
<language>en</language>
<stemmed-searches>advanced</stemmed-searches>
<word-searches>true</word-searches>
<word-positions>true</word-positions>
<fast-phrase-searches>true</fast-phrase-searches>
<fast-reverse-searches>false</fast-reverse-searches>
<triple-index>false</triple-index>
<triple-positions>false</triple-positions>
<fast-case-sensitive-searches>true</fast-case-sensitive-searches>
<fast-diacritic-sensitive-searches>true</fast-diacritic-sensitive-searches>
<fast-element-word-searches>true</fast-element-word-searches>
<element-word-positions>true</element-word-positions>
<fast-element-phrase-searches>true</fast-element-phrase-searches>
<element-value-positions>true</element-value-positions>
<attribute-value-positions>true</attribute-value-positions>
<field-value-searches>true</field-value-searches>
<field-value-positions>true</field-value-positions>
<three-character-searches>true</three-character-searches>
<three-character-word-positions>true</three-character-word-positions>
<fast-element-character-searches>true</fast-element-character-searches>
<trailing-wildcard-searches>true</trailing-wildcard-searches>
<trailing-wildcard-word-positions>true</trailing-wildcard-word-positions>
<fast-element-trailing-wildcard-searches>true</fast-element-trailing-wildcard-searches>
<word-lexicons>
<word-lexicon>http://marklogic.com/collation/codepoint</word-lexicon>
</word-lexicons>
<two-character-searches>false</two-character-searches>
<one-character-searches>false</one-character-searches>
<uri-lexicon>true</uri-lexicon>
<collection-lexicon>true</collection-lexicon>
<reindexer-enable>true</reindexer-enable>
<reindexer-throttle>5</reindexer-throttle>
<reindexer-timestamp>0</reindexer-timestamp>
<directory-creation>manual</directory-creation>
<maintain-last-modified>false</maintain-last-modified>
<maintain-directory-last-modified>false</maintain-directory-last-modified>
<inherit-permissions>false</inherit-permissions>
<inherit-collections>false</inherit-collections>
<inherit-quality>false</inherit-quality>
<in-memory-limit>262144</in-memory-limit>
<in-memory-list-size>512</in-memory-list-size>
<in-memory-tree-size>128</in-memory-tree-size>
<in-memory-range-index-size>16</in-memory-range-index-size>
<in-memory-reverse-index-size>16</in-memory-reverse-index-size>
<in-memory-triple-index-size>64</in-memory-triple-index-size>
<large-size-threshold>1024</large-size-threshold>
<locking>fast</locking>
<journaling>fast</journaling>
<journal-size>2047</journal-size>
<journal-count>2</journal-count>
<preallocate-journals>false</preallocate-journals>
<preload-mapped-data>false</preload-mapped-data>
<preload-replica-mapped-data>false</preload-replica-mapped-data>
<range-index-optimize>facet-time</range-index-optimize>
<positions-list-max-size>256</positions-list-max-size>
<format-compatibility>automatic</format-compatibility>
<index-detection>automatic</index-detection>
<expunge-locks>none</expunge-locks>
<tf-normalization>scaled-log</tf-normalization>
<merge-priority>lower</merge-priority>
<merge-max-size>49152</merge-max-size>
<merge-min-size>1024</merge-min-size>
<merge-min-ratio>1</merge-min-ratio>
<merge-timestamp>0</merge-timestamp>
<retain-until-backup>false</retain-until-backup>
<rebalancer-enable>true</rebalancer-enable>
<rebalancer-throttle>5</rebalancer-throttle>
<assignment-policy>
<assignment-policy-name>bucket</assignment-policy-name>
</assignment-policy>
</package-database-properties>
<links>
<forests-list>
<forest-name>r-f4</forest-name>
<forest-name>r-f3</forest-name>
<forest-name>r-f2</forest-name>
<forest-name>r-f1</forest-name>
</forests-list>
<security-database>Security</security-database>
<schema-database>Schemas</schema-database>
<triggers-database>Triggers</triggers-database>
</links>
</config>
I am not able to understand what went wrong. Why I am getting the wrong output.
Seems like, if in the data
element even a single word is present it is returned as a match.
please help me understand what wrong I am doing.
Update:
Upvotes: 2
Views: 134
Reputation: 4912
Look at the output of xdmp:plan
of your search. I expect the insensitive options are defeating wildcard optimization in such a way that you are getting a very weak query.
Upvotes: 1