Reputation: 7983
I am trying to find a way to divide a large xml file into chunks based on xpath expressions.
As I understand only xpath expressions that gives nodes having the same parent can be used to divide the xml file into chunks. How can I detect if the xpath expression that the user enters would give rise to nodes having the same parent?
For example consider the following xml file:
<?xml version="1.0" encoding="UTF-8"?>
<employees>
<employee>
<firstname>Asanka</firstname>
<lastname>Sanjeewa</lastname>
<address>
<no>No.123</no>
<road>Main Street</road>
<city>Negombo</city>
</address>
</employee>
<employee>
<firstname>Kamal</firstname>
<lastname>Silva</lastname>
<address>
<no>No.123</no>
<road>Main Street</road>
<city>Negombo</city>
</address>
</employee>
<employee>
<firstname>Roshan</firstname>
<lastname>Fernando</lastname>
<address>
<no>No.123</no>
<road>Main Street</road>
<city>Negombo</city>
</address>
</employee>
</employees>
If I were given the xpath expression: //employees/employee/firstname, the nodes obtained from this expression give rise to firstname nodes having different parents. But if I were given the xpath expression //employees/employee the resultant nodes have the same parent. How can I detect such xpath expressions which gives nodes having the same parent?
Upvotes: 0
Views: 69
Reputation: 16095
Take the XPath expression entered by your user, and enclose it in parenthesis. Then add /..
to the end, and wrap the whole thing in the count
function. This will give you the number of different parent elements returned from the original query. If the answer is one, you know that the resultant nodes have the same parent. If the answer is more than one, you know that you can't split the XML by the given XPath expression.
Examples:
//employees/employee/firstname
turns into count((//employees/employee/firstname)/..)
and gives result 3
//employees/employee
turns into count((/employees/employee)/..)
, and gives result 1
From those examples, you would see that enclosing the original XPath expression in parenthesis seems unnecessary, but unless you know that your users won't enter an expression like //firstname | //employee
, then it is important, because otherwise it would not correctly count the number of parents from the results.
Upvotes: 2