Improve SQL query for xpath Evaluating

Question

Database

I've got database for saving XML document. Database looks like this:

So I can save any XML file to my universal database.

XPath Query

Then I translate XPATH query to SQL query, for selecting elements.

examples of translated xpath queries:

1) //EMPTY[./PERIOD]

 SELECT e2.docId
 , e2.startPos
 , e2.endPos
 , p2.NodeName
 , p2.levelEl
 , p2.pathID

 From Path p2
 , Element e2
 , Path p3
 , Element e3

 WHERE e2.docID = p2.docID
 AND e2.pathID = p2.pathID
 AND p2.NodeName =  'EMPTY'
 AND p2.levelEl >= 1
 AND e3.docID = p3.docID
 AND e3.pathID = p3.pathID
 AND p3.NodeName =  '_PERIOD_'
 AND e2.startPos < e3.startPos
 AND e2.endPos > e3.endPos
 AND e2.docId = e3.docId
 AND p2.levelEl = p3.levelEl - 1

 AND e2.docId
 = 3147524262 GROUP BY e2.docId
 , e2.startPos
 , e2.endPos
 , p2.NodeName
 , p2.levelEl
 , p2.pathID

 ORDER BY startPos;

2) //EMPTY[./PERIOD]/S/NP

 SELECT e5.docId
 , e5.startPos
 , e5.endPos
 , p5.NodeName
 , p5.levelEl
 , p5.pathID

 From Path p2
 , Element e2
 , Path p3
 , Element e3
 , Path p4
 , Element e4
 , Path p5
 , Element e5

 WHERE e2.docID = p2.docID
 AND e2.pathID = p2.pathID
 AND p2.NodeName =  'EMPTY'
 AND p2.levelEl >= 1
 AND e3.docID = p3.docID
 AND e3.pathID = p3.pathID
 AND p3.NodeName =  '_PERIOD_'
 AND e2.startPos < e3.startPos
 AND e2.endPos > e3.endPos
 AND e2.docId = e3.docId
 AND p2.levelEl = p3.levelEl - 1
 AND e4.docID = p4.docID
 AND e4.pathID = p4.pathID
 AND p4.NodeName =  'S'
 AND e2.startPos < e4.startPos
 AND e2.endPos > e4.endPos
 AND e2.docId = e4.docId
 AND p2.levelEl = p4.levelEl - 1
 AND e5.docID = p5.docID
 AND e5.pathID = p5.pathID
 AND p5.NodeName =  'NP'
 AND e4.startPos < e5.startPos
 AND e4.endPos > e5.endPos
 AND e4.docId = e5.docId
 AND p4.levelEl = p5.levelEl - 1

 AND e5.docId
 = 3147524262 GROUP BY e5.docId
 , e5.startPos
 , e5.endPos
 , p5.NodeName
 , p5.levelEl
 , p5.pathID

 ORDER BY startPos;

Problem

As we can see, with every other node in xpath, Iam adding a table element and path into the FROM statements, which represents that node, with some where conditions to determine where this node should be located.

Table element has about 145 000 rows.
Table path has about 41 000 rows.

The problem is, that Query 1) runs pretty fast, but with every new node, the sql is much slower. For example Query 1) takes 24ms but Query 2) Runs something about 5 minutes.

I have added indexes, so Query Plan uses only index seek (according to Microsoft sql server).

Question

Do you know of any solution how to otherwise generate SQL query to make execution faster? Or some improvements for an existing query (DB table engines, etc.)?

Iam creating TREE from Xpath which looks like this

ROOT-ROOT   (type: ROOT) False                                                                                            
   //-EMPTY   (type: NODE) False                                                                                             
       /-_PERIOD_   (type: NODE) False                                                                                         
       /-S   (type: NODE) False                                                                                                  
          /-NP   (type: NODE) True

And from that tree i Generate the SQL

Radim Bača · Accepted Answer

There is a lot of research on this topic and your approach will inevitably lead to a lot of self joins which is slow. It is very close to this solution. I recommend you to use some native XQuery database such as BaseX, or Saxon which are inherently optimized to process the XQueries without the need to rewrite them into the SQL.

However, if you really want to rewrite XQueries into SQL then read for example XPath accelerator proposed by Torsten Grust. Ideas behind his work are implemented in MonetDB XQuery engine. He is using slightly different labeling scheme than you, but I guess the ideas can be implemented in your approach as well.

Improve SQL query for xpath Evaluating

Database

XPath Query

Problem

Question

Answers (1)

Related Questions