Reputation: 41
I'm currently working on a xquery script to retrieve data of a marklogic database in form of a html table with the possibility to order and filter every column. The data is distributed in multiple directories as different documents with foreign keys just like a relational database would hold the data. Now that might not be the optimal solution, but I sadly can't change that for now.
The first solution was via cts:search on the main directory (smlaws) which came with the option to filter and sort for all elements and that worked quite well. But some information does not exist in those documents, so I've had to make multiple other queries to retrieve all necessary data and then sort and filter afterwards. That worked fine with a few records but not with 40.000.
Well I'm now trying to find a solution with TDEs and the optic api of marklogic. At first this looked great, but as soon as i started filtering and ordering I was not getting the expected results.
The results are so random, I don't understand what's going on. Directly after the document-insert all records are found. After a few minutes they are not anymore. If I order by a column I'm still getting all expected results. If I then order again by that column, but descending, i get zero results. This behaviour seems to change randomly.
I've checked if marklogic finished indexing.
I've created an element-range-index for each column that's filtered/sorted.
{
"localname": "identifier",
"scalar-type": "string",
"namespace-uri": "",
"collation": "http://marklogic.com/collation/en/S1",
"range-value-positions": false,
"invalid-values": "ignore"
}
Also I'd like to find a better way for performing a like-condition. The script is mostly dynamic, that means with the current solution I have to check myself, if the user input (filters) are fine to use. Filtering with cts:element-word-query worked better, but the results were unfiltered which might be a problem of missing indicies, but I'm not sure if I have to had additional ones.
xquery version "1.0-ml";
import module namespace op = "http://marklogic.com/optic" at "/MarkLogic/optic.xqy";
let $smlawsview := op:from-view("main", "smlaws")
let $sfpublicationchannelsview := op:from-view("main", "sfpublicationchannels")
let $sfstatusview := op:from-view("main", "sfstatus")
let $smlawareasview := op:from-view("main", "smlawareas")
let $maps := $smlawsview
let $maps := op:where($maps, op:sql-condition(fn:concat("smlaws.identifier LIKE '%identifier%'")))
let $maps := op:join-inner($maps, $sfpublicationchannelsview, op:on("fksfpublicationchannels", op:view-col("sfpublicationchannels", "pksfpublicationchannels")))
let $maps := op:where($maps, op:eq(op:view-col("sfpublicationchannels", "fksfstatus"), "1"))
let $maps := op:join-inner($maps, $sfstatusview, op:on(op:view-col("sfpublicationchannels", "fksfstatus"), op:view-col("sfstatus", "pksfstatus")))
let $maps := op:join-inner($maps, $smlawareasview, op:on(op:view-col("smlaws", "fksmlawareas"), op:view-col("smlawareas", "pksmlawareas")))
let $maps := op:select($maps, (
op:view-col("smlaws", "pksmlaws"),
op:view-col("smlaws", "fksfpublicationchannels"),
op:view-col("smlaws", "fksmlawareas"),
op:view-col("smlaws", "identifier"),
op:view-col("sfpublicationchannels", "fksfstatus"),
op:view-col("sfstatus", "pksfstatus"),
op:view-col("sfstatus", "identifier"),
op:view-col("smlawareas", "pksmlawareas"),
op:view-col("smlawareas", "identifier")
))
let $maps := op:order-by($maps, op:asc(op:view-col("smlaws", "identifier")))
let $maps := op:offset-limit($maps, op:param("offset"), op:param("limit"))
let $maps := op:result($maps, (),
map:entry("offset", 0)
=> map:with("limit", 10)
)
return for $map in $maps
return element row {
element id {map:get($map, "main.smlaws.pksmlaws")},
element fksfpublicationchannels {map:get($map, "main.smlaws.fksfpublicationchannels")},
element identifier {map:get($map, "main.smlaws.identifier")},
element status {map:get($map, "main.sfstatus.pksfstatus")},
element fksfstatus {map:get($map, "main.sfstatus.identifier")},
element lawarea {map:get($map, "main.smlawareas.pksmlawareas")},
element fksmlawareas {map:get($map, "main.smlawareas.identifier")}
}
<smlaws>
<id>1</id>
<identifier>law 1</identifier>
<fksfpublicationchannels>1</fksfpublicationchannels>
<fksmlawareas>1</fksmlawareas>
</smlaws>
<sfpublicationchannels>
<id>1</id>
<identifier>publicationchannel 1</identifier>
<fksfstatus>1</fksfstatus>
</sfpublicationchannels>
<sfstatus>
<id>1</id>
<identifier>status 1</identifier>
</sfstatus>
<smlawareas>
<id>1</id>
<identifier>lawarea 1</identifier>
</smlawareas>
<tde:template xmlns:tde='http://marklogic.com/xdmp/tde' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://marklogic.com/xdmp/tde '>
<tde:description>TDE view for smlaws</tde:description>
<tde:context>/envelope/instance/smlaws</tde:context>
<tde:enabled>true</tde:enabled>
<tde:rows>
<tde:row>
<tde:schema-name>main</tde:schema-name>
<tde:view-name>smlaws</tde:view-name>
<tde:columns>
<tde:column>
<tde:name>pksmlaws</tde:name>
<tde:scalar-type>int</tde:scalar-type>
<tde:val>id</tde:val>
<tde:nullable>false</tde:nullable>
<tde:collation>http://marklogic.com/collation/en/S1</tde:collation>
</tde:column>
<tde:column>
<tde:name>fksmlawareas</tde:name>
<tde:scalar-type>int</tde:scalar-type>
<tde:val>fksmlawareas</tde:val>
<tde:nullable>true</tde:nullable>
<tde:collation>http://marklogic.com/collation/en/S1</tde:collation>
</tde:column>
<tde:column>
<tde:name>identifier</tde:name>
<tde:scalar-type>string</tde:scalar-type>
<tde:val>identifier</tde:val>
<tde:nullable>true</tde:nullable>
<tde:collation>http://marklogic.com/collation/en/S1</tde:collation>
</tde:column>
<tde:column>
<tde:name>fksfpublicationchannels</tde:name>
<tde:scalar-type>int</tde:scalar-type>
<tde:val>fksfpublicationchannels</tde:val>
<tde:nullable>false</tde:nullable>
<tde:collation>http://marklogic.com/collation/en/S1</tde:collation>
</tde:column>
</tde:columns>
</tde:row>
</tde:rows>
</tde:template>
Upvotes: 4
Views: 446
Reputation: 7335
For the variability in the result set, the best step is likely to raise a support ticket to start an investigation.
For user-supplied filtering criteria, consider
op:where(op:sql-condition(...criteria...))
as described here: https://docs.marklogic.com/op:sql-condition
If the filtering criteria can be applied prior to some of the joins, the query might perform better by the following sequence of operations:
By joining a smaller data set, that reduces the work. If the columns used for filtering could be denormalized into the source documents for the smlaws view so filtering could happen before any joins, that would minimize the number of joins.
Hoping that helps,
Upvotes: 1