Reputation: 506
I've got database for saving XML document. Database looks like this:
So I can save any XML file to my universal database.
Then I translate XPATH query to SQL query, for selecting elements.
1) //EMPTY[./PERIOD]
SELECT e2.docId
, e2.startPos
, e2.endPos
, p2.NodeName
, p2.levelEl
, p2.pathID
From Path p2
, Element e2
, Path p3
, Element e3
WHERE e2.docID = p2.docID
AND e2.pathID = p2.pathID
AND p2.NodeName = 'EMPTY'
AND p2.levelEl >= 1
AND e3.docID = p3.docID
AND e3.pathID = p3.pathID
AND p3.NodeName = '_PERIOD_'
AND e2.startPos < e3.startPos
AND e2.endPos > e3.endPos
AND e2.docId = e3.docId
AND p2.levelEl = p3.levelEl - 1
AND e2.docId
= 3147524262 GROUP BY e2.docId
, e2.startPos
, e2.endPos
, p2.NodeName
, p2.levelEl
, p2.pathID
ORDER BY startPos;
2) //EMPTY[./PERIOD]/S/NP
SELECT e5.docId
, e5.startPos
, e5.endPos
, p5.NodeName
, p5.levelEl
, p5.pathID
From Path p2
, Element e2
, Path p3
, Element e3
, Path p4
, Element e4
, Path p5
, Element e5
WHERE e2.docID = p2.docID
AND e2.pathID = p2.pathID
AND p2.NodeName = 'EMPTY'
AND p2.levelEl >= 1
AND e3.docID = p3.docID
AND e3.pathID = p3.pathID
AND p3.NodeName = '_PERIOD_'
AND e2.startPos < e3.startPos
AND e2.endPos > e3.endPos
AND e2.docId = e3.docId
AND p2.levelEl = p3.levelEl - 1
AND e4.docID = p4.docID
AND e4.pathID = p4.pathID
AND p4.NodeName = 'S'
AND e2.startPos < e4.startPos
AND e2.endPos > e4.endPos
AND e2.docId = e4.docId
AND p2.levelEl = p4.levelEl - 1
AND e5.docID = p5.docID
AND e5.pathID = p5.pathID
AND p5.NodeName = 'NP'
AND e4.startPos < e5.startPos
AND e4.endPos > e5.endPos
AND e4.docId = e5.docId
AND p4.levelEl = p5.levelEl - 1
AND e5.docId
= 3147524262 GROUP BY e5.docId
, e5.startPos
, e5.endPos
, p5.NodeName
, p5.levelEl
, p5.pathID
ORDER BY startPos;
As we can see, with every other node in xpath, Iam adding a table element and path into the FROM statements, which represents that node, with some where conditions to determine where this node should be located.
The problem is, that Query 1) runs pretty fast, but with every new node, the sql is much slower. For example Query 1) takes 24ms but Query 2) Runs something about 5 minutes.
I have added indexes, so Query Plan uses only index seek (according to Microsoft sql server).
Do you know of any solution how to otherwise generate SQL query to make execution faster? Or some improvements for an existing query (DB table engines, etc.)?
Iam creating TREE from Xpath which looks like this
ROOT-ROOT (type: ROOT) False
//-EMPTY (type: NODE) False
/-_PERIOD_ (type: NODE) False
/-S (type: NODE) False
/-NP (type: NODE) True
And from that tree i Generate the SQL
Upvotes: 1
Views: 391
Reputation: 10701
There is a lot of research on this topic and your approach will inevitably lead to a lot of self joins which is slow. It is very close to this solution. I recommend you to use some native XQuery database such as BaseX, or Saxon which are inherently optimized to process the XQueries without the need to rewrite them into the SQL.
However, if you really want to rewrite XQueries into SQL then read for example XPath accelerator proposed by Torsten Grust. Ideas behind his work are implemented in MonetDB XQuery engine. He is using slightly different labeling scheme than you, but I guess the ideas can be implemented in your approach as well.
Upvotes: 4