mpuram
mpuram

Reputation: 149

case-sensitive is not working for small alphabets with JSearch in marklogic

The case-sensitive is not working in jsearch marklogic for the words starting with small alphabets. I tried below code for case-sensitive search in marklogic using jsearch.documents().

strange thing is it is working fine for capital alphabets but not for small.

example in below : code snippet working fine for value = "Rabbit" but not working for value = "rabbit"

Note : fast case sensitive searches are enabled in the database

'use strict';
/* make sure fast case sensitive searches are enabled in the database 
*  create 3 documents, then run 2 jsearches, one with
*  rabbit and one with Rabbit, printing out the search estimates
*/

declareUpdate();
const jsearch = require("/MarkLogic/jsearch.sjs")
xdmp.documentInsert('/1.json',{species:'rabbit'},{collections : "testCollection"})
xdmp.documentInsert('/2.json',{species:'Rabbit'},{collections : "testCollection"})
xdmp.documentInsert('/3.json',{species:'Rabbit'},{collections : "testCollection"})

let value = "rabbit"
//let value = "Rabbit"
let query = cts.andQuery([
                cts.jsonPropertyWordQuery("species", value, ["case-sensitive","lang=en"], 3),
                cts.collectionQuery("testCollection")
            ], [])
let result = jsearch.documents().
                where(query).
                slice(0,2).                
                map({
                     snippet: true
                }).
                result()
result.estimate

incorrect results are returning as 3 for "rabbit" which should be 1.

I mentioned result.estimate just to refer the number of values returned but we need to return document results as well result.results

Upvotes: 0

Views: 85

Answers (2)

Fiona Chen
Fiona Chen

Reputation: 1368

MarkLogic Search is case-deterministic by default. If the searched text is all lowercase, MarkLogic performs case-insensitive search; if the searched text contains uppercase, it performs case-sensitive search. (uppercase in English word stems to itself).

Should you wish to use jsearch to alter the default mechanism, you would have to use search options to constrain the query. However, jsearch doesn’t support the full search API properties and/or query options. You have limited leeway to apply desired search options in jsearch. e.g below cts.search option has no effect on the query constructed in jsearch:

cts.search(cts.jsonPropertyValueQuery("species", "rabbit", "case-sensitive"));

Below should return the desired results if you opt for jsearch:

let query = cts.andQuery([
                        cts.jsonPropertyWordQuery("species", "Rabbit", "case-sensitive"),
                        cts.collectionQuery("testCollection")
                        ])
let result = jsearch.documents()
                    .where(query) 
                    .filter()
                    .result()
result.results;

Result:

{
  "index": 0, 
  "uri": "/1.json", 
  "score": 43008, 
  "confidence": 0.566326320171356, 
  "fitness": 0.677866637706757, 
  "document": {
    "species": "rabbits"
  }
}

Upvotes: 1

hunterhacker
hunterhacker

Reputation: 7132

Here's what I think is actually going on. MarkLogic's case-sensitive word index doesn't store entries for entirely-lower-case words. This intentional decision makes the case-sensitive word index much smaller (because it only has to store entries for words that have some uppercase to them).

So what happens if you do a case-sensitive search for an entirely-lower-case word? Then the database uses the case-insensitive term list for that word. That will include versions of the word with case (as you saw), but those can be tossed during filtering. That's why the database has filtering.

The estimate will be a touch high, though, because the words with case will be included. That's one reason why it's called estimate and not count.

With JSearch I believe the default behavior is unfiltered, so you saw the extra results. With cts.search the default behavior is filtered, so Fiona's sample code would remove those extra results.

To solve your results problem, you could add a filter clause to the JSearch call, which is a good idea when you know the indexes need a little help to achieve accuracy.

https://docs.marklogic.com/guide/search-dev/javascript#id_39247

Upvotes: 3

Related Questions