Efficiently grouping elements that exists in both documents (inner join) in Xquery

Question

I have the following data:


    
        1
        Maths
    
    
        2
        Science
    
    
        2
        Advanced Science

and:


    
        1
        Algebra I
    
    
        1
        Algebra II
    
    
        1
        Percentages
    
    
        2
        Physics
    
    
        2
        Biology

I wish to efficiently get elements from both documents that share the share the same Ids.

I want to get the result like this:


    
        
            
                1
                Maths
            
        
        
            
                1
                Algebra I
            
            
                1
                Algebra II
            
            
                1
                Percentages
            
        
    
    
        
            
                2
                Science
            
            
                2
                Advanced Science
            
        
        
            
                2
                Physics
            
            
                2
                Biology

So far I have 2 solutions:

       
{
   for $e2 in $t2/Course
   let $foriegnId := $e2/SubjectId
   group by $foriegnId
   let $e1 := $t1/Subject[Id = $foriegnId]
   where $e1
   return
      
         
            {$e1}
         
         
            {$e2}
         
      
}

and the otherway round:

       
{
   for $e1 in $t1/Subject
   let $id := $e1/Id
   group by $id
   let $e2 := $t2/Course[SubjectId = $id]
   where $e2
   return
      
         
            {$e1}
         
         
            {$e2}
         
      
}

Is there a more efficient way of doing this? Perhaps taking advantages of multiple groups?

Update A major issue with my code at the moment is that it's performance is highly dependent on which table is bigger. For example the 1st solution is better in cases where the 2nd table is bigger and vice versa.

Michael Kay · Accepted Answer

The solution you have looks reasonable to me. It will perform siginificantly better on a processor like Saxon-EE that does join optimization than on one (like Saxon-HE) that doesn't. If you want to hand-optimize it, your simplest approach is to switch to using XSLT: use the key() function to replace the filter expression $t1/Subject[Id = $foriegnId] which, in the absence of optimization, searches your second file once for each element selected in the first file.

Efficiently grouping elements that exists in both documents (inner join) in Xquery

Answers (1)

Related Questions