Reputation: 177
I was doing some workout to find out performance wise which one is better either element-value-query or path range query.
I found that search with path range query is a little slower than element value query.
Does anyone have any idea about slowness of search with path rang query even it is loaded with extra path range index.
I have used below code.
xdmp:document-delete("/aname4.xml")
xdmp:document-insert("/aname1.xml",
<name><fname>John</fname><mname>Rob</mname><lname>Goldings</lname></name>),
xdmp:document-insert("/aname2.xml",
<name><fname>Jim</fname><mname>Ken</mname><lname>Kurla</lname></name>),
xdmp:document-insert("/aname3.xml",
<name><fname>Ooi</fname><mname>Ben</mname><lname>Fu</lname></name>),
xdmp:document-insert("/aname4.xml",
<name><fname>James</fname><mname>Rick</mname><lname>Tod</lname></name>)
create path range index "/name/fname"
checked response time with following search code
cts:search(doc(),cts:path-range-query("/name/fname","=","Jim"),"filtered")
cts:search(doc(),
cts:element-value-query(xs:QName("fname"),"jim"),
"filtered")
Is there specific thing that I should consider while using path range query
Any suggestion would be highly appreciated as it help us to design efficient search code.
Upvotes: 1
Views: 337
Reputation: 4912
In a filtered query, every candidate matching document needs to be walked to check for matches. To verify an element range match, we just need to look at the name of the element and then its contents (if the names match). To verify a path range match, we need to make sure the current element's name matches the end of the path, and then (in this case) that its parent element name matches and then that that parent element is at the root. That's not hugely more work, but it is more work. Indexing similarly needs to do a little more work to know which element contents to index.
But you are also doing an apples to oranges comparison in a different way: a value query is not the same as a range equality query because a value query is a full text query -- stemmed and tokenized and generally ignoring whitespace and punctuation -- and a range equality query is string comparison using a collation. For a simple value query a lot of the work can be done via keys, not string comparisons, but it will do extra stemming work. On the other hand, we are doing string comparisons on the range query side and for a non-codepoint collation those comparisons can be somewhat involved.
Where the path range index would be a win would be if you have documents with fname
elements that aren't under name
so they can be excluded via index resolution and the filter never even needs to consider them.
My general advice here is:
1. Measure, because it is never what you think
2. Rule of thumb: Pick the least restrictive index that you need to make the distinctions you care about. i.e. if all your fname
elements are always under name
, then don't put name
in your path because it just adds work.
3. Value queries are just word queries with an added "must match the whole extent of the element" constraint; don't think of them as string equality. Use range indexes for string comparisons, but choose the most boring collation you can get away with for your use case.
Upvotes: 6