Reputation: 2192
I have created a simple field index which looks like this:
content
I am creating a document with an element content
and two child-elements header
and body
. The second request uses the field index to find all values and test if it contains a word Body
. As expected, it does. I am then updating my document without the body
element and again requesting the field index words. The field index still contains the word Body
. This is my test script:
xquery version "1.0-ml";
xdmp:document-insert("test.xml",
<test>
<title>not found</title>
<content>
<header>Found</header>
<body>Body</body>
</content>
</test>
);
fn:exists(fn:index-of(
cts:field-words("root_test", (), ("collation=http://marklogic.com/collation/de/S1")),
"Body"
)) = fn:true();
xdmp:document-insert("test.xml",
<test>
<title>not found</title>
<content>
<header>Found</header>
</content>
</test>
);
fn:empty(fn:index-of(
cts:field-words("root_test", (), ("collation=http://marklogic.com/collation/de/S1")),
"Body"
)) = fn:true()
I expected the following output:
true
true
But what I actually get is:
true
false
Only if I execute a manual merge after the update (second insert), the word Body
gets removed from the field index.
Am I doing something wrong here? Using 9.0-8
Upvotes: 0
Views: 36
Reputation: 4912
The word lexicon doesn't keep track of specific document instances -- to do so would be prohibitively expensive -- and so it cannot purge information about deleted words until after a merge. Word lexicons on for query suggestion and to assist certain wildcard queries; you shouldn't count on them to provide precise information about the presence or absence of specific words in the corpus.
If want to know whether a specific word is in the corpus, do an estimate of a word query, e.g. xdmp:estimate(cts:search(doc(),cts:word-query("Body",("unstemmed","case-insensitive","diacritic-insensitive"))))
. That won't give quite the same equality constraints as your collation, however, because search is codepoint based and doesn't fold compatibility characters and the like.
Upvotes: 2