Wagner Michael
Wagner Michael

Reputation: 2192

Field index only updated after merge

I have created a simple field index which looks like this:

I am creating a document with an element content and two child-elements header and body. The second request uses the field index to find all values and test if it contains a word Body. As expected, it does. I am then updating my document without the body element and again requesting the field index words. The field index still contains the word Body. This is my test script:

xquery version "1.0-ml";

xdmp:document-insert("test.xml", 
  <test>
    <title>not found</title>
    <content>
      <header>Found</header>
      <body>Body</body>
    </content>
  </test>
);
fn:exists(fn:index-of(
  cts:field-words("root_test", (), ("collation=http://marklogic.com/collation/de/S1")), 
  "Body"
)) = fn:true();

xdmp:document-insert("test.xml", 
  <test>
    <title>not found</title>
    <content>
      <header>Found</header>
    </content>
  </test>
);
fn:empty(fn:index-of(
  cts:field-words("root_test", (), ("collation=http://marklogic.com/collation/de/S1")),
  "Body"
)) = fn:true()

I expected the following output:

true
true

But what I actually get is:

true
false

Only if I execute a manual merge after the update (second insert), the word Body gets removed from the field index.

Am I doing something wrong here? Using 9.0-8

Upvotes: 0

Views: 36

Answers (1)

mholstege
mholstege

Reputation: 4912

The word lexicon doesn't keep track of specific document instances -- to do so would be prohibitively expensive -- and so it cannot purge information about deleted words until after a merge. Word lexicons on for query suggestion and to assist certain wildcard queries; you shouldn't count on them to provide precise information about the presence or absence of specific words in the corpus.

If want to know whether a specific word is in the corpus, do an estimate of a word query, e.g. xdmp:estimate(cts:search(doc(),cts:word-query("Body",("unstemmed","case-insensitive","diacritic-insensitive")))). That won't give quite the same equality constraints as your collation, however, because search is codepoint based and doesn't fold compatibility characters and the like.

Upvotes: 2

Related Questions