Reputation: 31
I am trying to query between 2 collections having a join condition between them, Say the 2 documents are:
Collection 1: Company (1M documents)
Company Documents:
<Company>
<CompanyName>XYZ</CompanyName>
<Description>ABC</Description>
.
.
</Company>
Collection 2: Employee (1M documents)
XML:
<Employee>
<EmpName>john</EmpName>
<CompanyName>XYZ</CompanyName>
.
.
</Employee>
The Query I am using is,
for $company in collection('Company'), $employee in collection('Employee')
where ( $company/Company/CompanyName eq $employee/Employee/CompanyName)
return document-uri($company)
when there are less documents in both collections, able to fetch the records, but when i do stress testing say, 1M records in both collections, in this case takes a lot of time and results in time limit exceeded exception. Is there a way to optimize this query?
I have tried the following methods, assuming there exists a range index for CompanyName,
for $company in collection('Company'), $employee in cts:values(cts:path-reference('/Employee/CompanyName'))
where ( $company/Company/CompanyName eq $employee)
return document-uri($company)
if companyName is unique then the above Query will not optimize much. I need a way which can work for both distinct and duplicate value of elements
Upvotes: 3
Views: 81
Reputation: 66723
Leveraging indexes and cts:* functions can speed this up dramatically.
Use cts:element-values()
to retrieve the set of CompanyName
values from the documents in the Employee
collection.
Use cts:uris()
to retrieve the set of document URIs from the Company
collection that have CompanyName
values from the set of values from the Employee documents using cts:element-range-query()
.
let $CompanyName := xs:QName("CompanyName")
let $employee-companies := cts:element-values($CompanyName, "", (), cts:collection-query("Employee"))
return
cts:uris("", (), cts:and-query((
cts:collection-query("Company"),
cts:element-range-query($CompanyName, "=", $employee-companies)
)))
Upvotes: 3