Alma Pellett
Alma Pellett

Reputation: 314

How can I cts query on two values?

I'm trying to work out a nice cts query that matches two nodes, rather than one. For example, I have records from two sources, both with an ID value and a dateTime value. I'd like to find records in the first source that has a matching ID in the second source and a newer dateTime value.

Something like this (does not work):

cts:uris(
  (),
  (),
  cts:and-query((
    cts:collection-query("source1"),
    cts:path-range-query(
      "/record/ID",
      "=",
      cts:values(
        cts:path-reference("/record/ID"),
        (),
        (),
        cts:collection-query("source2")
      )
    ),
    cts:path-range-query(
      "/record/dateTimeValue",
      ">",
      cts:values(
        cts:path-reference("/record/dateTimeValue"),
        (),
        (),
        cts:collection-query("source2")
      )
    )
  ))
)

This wont work because it returns records that have an equal ID value and where there also exists a record with a greater dateTimeValue

How do I make the cts query match on two values? Can I only do this through a FLWOR?

Upvotes: 0

Views: 244

Answers (3)

BenW
BenW

Reputation: 433

You can do this with a cts query by searching source1 for every combination found in source2 (or vice versa). I don't know how performant it would be... it doesn't seem like it should be worse than getting two co-occurrence maps and doing it manually.

cts:uris(
    (),
    (),
    cts:and-query((
        cts:collection-query("source1"),
        cts:or-query((
            for $tuple in cts:value-co-occurrences(
                cts:path-reference("/record/ID"),
                cts:path-reference("/record/dateTimeValue"),
                (), 
                cts:collection-query("source2")
            )
            return cts:and-query((
                cts:path-range-query("/record/ID", "=", $tuple/cts:value[1]),
                cts:path-range-query("/record/dateTimeValue", ">", $tuple/cts:value[2])
            ))
        ))
    )
)

If the ID overlap between source1 and source2 is small, then it's probably better to find the overlap first and plug those IDs into the co-occurrence query, so it isn't scatter-querying so widely.

Upvotes: 1

Fiona Chen
Fiona Chen

Reputation: 1368

This does the work:

let $local:cL :=  function($dt as xs:dateTime, $id as xs:string)
{  
  let $query :=
          cts:and-query((
                cts:collection-query("source1"),
                cts:path-range-query("/record/dateTimeValue", ">", $dt),
                cts:path-range-query("/record/ID", "=", $id)
          ))
   for $uri in cts:uris("", "document", $query)
   return 
     <uri>{$uri}</uri>
}
let $docs := cts:search(doc(), cts:collection-query("source2"))
for $doc in $docs
return 
  xdmp:apply( $local:cL, $doc/record/dateTimeValue, $doc/record/ID )

Upvotes: 0

ehennum
ehennum

Reputation: 7335

If I understand the requirement correctly, this query could be implemented efficiently with a join:

  • Create a TDE view scoped to the collection with a context of /record that projects id and datetime columns
  • Use an Optic query to join the view with itself on the id with a condition of a greater datetime and then join the matching documents (or project all of the needed columns from the record)

Something like the following:

  const docid = op.fragmentIdCol('docid');
  const v1    = op.fromView(null,"source2", "v1", docid);
  const v2    = op.fromView(null,"source2", "v2");
  v1.joinInner(v2, op.on(v1.col("id"), v2.col("id"),
       op.gt(v1.col("datetime"), v2.col("datetime")))
    .select(docid)
    .joinDoc('doc', 'docid')
    .result();

For more detail, see: https://docs.marklogic.com/ModifyPlan.prototype.joinInner

Hoping that helps,

Upvotes: 2

Related Questions