Yahya Uddin
Yahya Uddin

Reputation: 28841

Efficiently grouping elements that exists in both documents (inner join) in Xquery

I have the following data:

<Subjects>
    <Subject>
        <Id>1</Id>
        <Name>Maths</Name>
    </Subject>
    <Subject>
        <Id>2</Id>
        <Name>Science</Name>
    </Subject>
    <Subject>
        <Id>2</Id>
        <Name>Advanced Science</Name>
    </Subject>
</Subjects>

and:

<Courses>
    <Course>
        <SubjectId>1</SubjectId>
        <Name>Algebra I</Name>
    </Course>
    <Course>
        <SubjectId>1</SubjectId>
        <Name>Algebra II</Name>
    </Course>
    <Course>
        <SubjectId>1</SubjectId>
        <Name>Percentages</Name>
    </Course>
    <Course>
        <SubjectId>2</SubjectId>
        <Name>Physics</Name>
    </Course>
    <Course>
        <SubjectId>2</SubjectId>
        <Name>Biology</Name>
    </Course>
</Courses>

I wish to efficiently get elements from both documents that share the share the same Ids.

I want to get the result like this:

<Results>
    <Result>
        <Table1>
            <Subject>
                <Id>1</Id>
                <Name>Maths</Name>
            </Subject>
        </Table1>
        <Table2>
            <Course>
                <SubjectId>1</SubjectId>
                <Name>Algebra I</Name>
            </Course>
            <Course>
                <SubjectId>1</SubjectId>
                <Name>Algebra II</Name>
            </Course>
            <Course>
                <SubjectId>1</SubjectId>
                <Name>Percentages</Name>
            </Course>
        </Table2>
    </Result>
    <Result>
        <Table1>
            <Subject>
                <Id>2</Id>
                <Name>Science</Name>
            </Subject>
            <Subject>
                <Id>2</Id>
                <Name>Advanced Science</Name>
            </Subject>
        </Table1>
        <Table2>
            <Course>
                <SubjectId>2</SubjectId>
                <Name>Physics</Name>
            </Course>
            <Course>
                <SubjectId>2</SubjectId>
                <Name>Biology</Name>
            </Course>
        </Table2>
    </Result>
</Results>

So far I have 2 solutions:

<Results>       
{
   for $e2 in $t2/Course
   let $foriegnId := $e2/SubjectId
   group by $foriegnId
   let $e1 := $t1/Subject[Id = $foriegnId]
   where $e1
   return
      <Result>
         <Table1>
            {$e1}
         </Table1>
         <Table2>
            {$e2}
         </Table2>
      </Result>
}
</Results>

and the otherway round:

<Results>       
{
   for $e1 in $t1/Subject
   let $id := $e1/Id
   group by $id
   let $e2 := $t2/Course[SubjectId = $id]
   where $e2
   return
      <Result>
         <Table1>
            {$e1}
         </Table1>
         <Table2>
            {$e2}
         </Table2>
      </Result>
}
</Results>

Is there a more efficient way of doing this? Perhaps taking advantages of multiple groups?

Update A major issue with my code at the moment is that it's performance is highly dependent on which table is bigger. For example the 1st solution is better in cases where the 2nd table is bigger and vice versa.

Upvotes: 1

Views: 49

Answers (1)

Michael Kay
Michael Kay

Reputation: 163262

The solution you have looks reasonable to me. It will perform siginificantly better on a processor like Saxon-EE that does join optimization than on one (like Saxon-HE) that doesn't. If you want to hand-optimize it, your simplest approach is to switch to using XSLT: use the key() function to replace the filter expression $t1/Subject[Id = $foriegnId] which, in the absence of optimization, searches your second file once for each element selected in the first file.

Upvotes: 2

Related Questions