Manoj
Manoj

Reputation: 31

Optimize retrieval of records when quering between multiple collections in marklogic

I am trying to query between 2 collections having a join condition between them, Say the 2 documents are:

Collection 1: Company (1M documents)

Company Documents:

<Company>
 <CompanyName>XYZ</CompanyName>  
 <Description>ABC</Description>
  .
  .
</Company>

Collection 2: Employee (1M documents)

XML:

<Employee>
 <EmpName>john</EmpName>  
 <CompanyName>XYZ</CompanyName>  
  .
  .
</Employee>

The Query I am using is,

for $company in collection('Company'), $employee in collection('Employee')
where ( $company/Company/CompanyName eq $employee/Employee/CompanyName)
return document-uri($company)

when there are less documents in both collections, able to fetch the records, but when i do stress testing say, 1M records in both collections, in this case takes a lot of time and results in time limit exceeded exception. Is there a way to optimize this query?

I have tried the following methods, assuming there exists a range index for CompanyName,

for $company in collection('Company'), $employee in cts:values(cts:path-reference('/Employee/CompanyName'))
where ( $company/Company/CompanyName eq $employee)
return document-uri($company)

if companyName is unique then the above Query will not optimize much. I need a way which can work for both distinct and duplicate value of elements

Upvotes: 3

Views: 81

Answers (1)

Mads Hansen
Mads Hansen

Reputation: 66723

Leveraging indexes and cts:* functions can speed this up dramatically.

Use cts:element-values() to retrieve the set of CompanyName values from the documents in the Employee collection.

Use cts:uris() to retrieve the set of document URIs from the Company collection that have CompanyName values from the set of values from the Employee documents using cts:element-range-query().

let $CompanyName := xs:QName("CompanyName")
let $employee-companies := cts:element-values($CompanyName, "", (), cts:collection-query("Employee"))
return
  cts:uris("", (), cts:and-query((
    cts:collection-query("Company"), 
    cts:element-range-query($CompanyName, "=", $employee-companies) 
  )))

Upvotes: 3

Related Questions