Reputation: 3636
I was recently given the task of creating a search field in our MarkLogic database. The point in our XML that needs to be searched can look like this:
<title_group>
<title xml:lang="fr" source="sdo">Amendement 2 - Dispositifs à semiconducteurs - Partie 16-1: Circuits intégrés hyperfréquences - Amplificateurs</title>
<title xml:lang="en" source="sdo">Amendment 2 - Semiconductor devices - Part 16-1: Microwave integrated circuits - Amplifiers</title>
<title xml:lang="no">Tillegg 2 - Halvlederenheter - Del 16-1: Mikrobøgekretser - Forsterkere</title>
</title_group>
These nodes are currently not a range element index in the admin.
Now, in this particular case, I believe the hyphens are causing problems. I've tried:
let $searchTerm := fn:replace($title, "\s+-\s+", "* *")
let $searchTerm := fn:replace($searchTerm, "-", "* *")
but to little avail.
The current search is done as follows:
let $product_query:= cts:element-word-query(xs:QName("product:title"), fn:concat("*",$searchTerm,"*"), ("case-insensitive", "punctuation-insensitive"))
let $products := cts:search(/product:product, $product_query, ("filtered", $index_order))[1 to $result_limit]
This enables me to get a proper result when I search for "Tillegg 2" or "Tillegg 2 - Halvlederenheter", but it fails when I include anything more of the title. Do I need to preprocess the string into an and-query, or is there a smarter way?
Upvotes: 1
Views: 159
Reputation: 321
I'm not sure why something simpler doesn't work. With that xml doc in my db I can get it back with
let $searchTerm := 'Tillegg 2 - Halvlederenheter - Del 16-1: Mikrobøgekretser'
let $product_query
:= cts:element-word-query(xs:QName("title"), $searchTerm, ('lang=no'))
return cts:search(/, $product_query)
Is that what you wanted?
I had to change/simplify a lot from what you posted. Also, lang=no might be treated as a generic language in v8, though that doesn't come into play exactly here. If you want the words to appear in any order (like your solution) then this seems to work:
let $searchTerm := 'Mikrobøgekretser Tillegg Halvlederenheter 2 -
Halvlederenheter - Del 16'
let $words := fn:distinct-values (cts:tokenize ($searchTerm, 'lang=no')
! (if (. instance of cts:word) then . else ()))
let $product_query := cts:element-word-query(xs:QName("title"), $words,
('lang=no'))
return ($words, cts:search(/, $product_query))
Edit: sorry, that last is an OR, not an AND. For that, you could get the words the same way, and then construct the and query as you did.
Upvotes: 0
Reputation: 3636
If anyone else happens to look for an answer to the same thing, this is how I solved it:
fn:normalize-space
on the search string, to remove whitespacefn:tokenize($searchString, '\s+')
to get a list of search tokens.cts:and-query
with a number of cts:element-word-query
inside it. They had the search options "case-insensitive", "punctuation-insensitive", "diacritic-insensitive", "whitespace-insensitive", "unstemmed", "unwildcarded"Upvotes: 2