Reputation: 497
I have a XML file that follows this DTD structure.
<!DOCTYPE report [
<!ELEMENT report (title,section+)>
<!ELEMENT section (title,body?,section*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT body (para+)>
<!ELEMENT para(#PCDATA)>
<!ATTLIST book version CDATA #REQUIRED>
<!ATTLIST section number ID CDATA #REQUIRED>
]>
And I want to query the following two things using XQuery.
1. Get all titles that appear at least twice (two sections with same title).
for $x in /report/section/
for $y in /report/section/
where $x/@title = $y/@title
return $x/@title
2. Get the number and titles of all sections with at least 10 paragraphs in the body or 5 nested sections.
for $x in /report/section/
where $x/para >= 10 or count(/section) > 10
return <large>$x/number $x/title</large>
But my queries don't seem to be correct. I am a beginner with XQuery OR XPath, could someone tell me how to fix my queries?
Edit: Sample XML
<?xml version="1.0" encoding="UTF-8"?>
<report version = '1'>
<title>Harry Potter</title>
<section number = '1'>
<title>sec1</title>
<body>
<para>1</para>
<para>2</para>
<para>3</para>
<para>4</para>
<para>5</para>
<para>6</para>
<para>7</para>
<para>8</para>
<para>9</para>
<para>10</para>
<para>11</para>
</body>
</section>
<section number = '2'>
<title>sec2</title>
<body><para>test</para></body>
<section number = '2.1'>
<title>sec21</title>
<body>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
</body>
</section>
<section number = '2.2'>
<title>sec21</title>
<body><para>test</para></body>
</section>
<section number = '2.3'>
<title>sec23</title>
<body><para>test</para></body>
</section>
<section number = '2.4'>
<title>sec24</title>
<body><para>test</para></body>
</section>
<section number = '2.5'>
<title>sec25</title>
<body><para>test</para></body>
</section>
<section number = '2.6'>
<title>sec1</title>
<body><para>test</para></body>
</section>
</section>
</report>
Upvotes: 4
Views: 2549
Reputation: 163458
For the first example in XQuery 3.0 I would use
declare context item := doc("example.xml");
for $x in /report//section/title/data()
group by $x
where count($x) > 1
return $x[1]
Upvotes: 2
Reputation: 7590
In your first example, there are two problems. First off, you are not getting the nested sections, because you are only iterating over the section elements that are direct children of the report element. Secondly, you are using two loops over the same content. It is possible for both $x
and $y
to be the same element, so the where condition will match at least once for each section. I would write it like this:
for $x in distinct-values(/report//section/title)
where count(/report//section[title=$x]) > 1
return $x
The loop gets all unique titles and loops over them (note that we use report//section
to get all descendant sections). Then for each of these, we count how many times it was used keeping the ones that occurred more than once. We then return the loop variable (which is bound to the title).
Running it, we get back
sec1 sec21
In the second case, we have the same problem of not getting all descendants. We also need to take the counts. I would use
for $x in /report//section
where count($x/body/para) > 9 or count($x/section) > 4
return <large>{$x/@number} {string($x/title)}</large>
Notice that I selected $x/body/para
to get the paragraphs in the section (they occur as children of the body element). This counts the direct descendants, but can be modified to get all descendants if necessary. Notice also the use of curly brackets in the direct element constructor. When we construct a direct element, all text is read literally. The curly brackets are used to evaluate an xquery expression instead of literal text.
I used the string function on the title in order to extract the text contents of the element. If we didn't do that, we would get an actual title element instead of its content (which may be a desired behavior). As we extract the number attribute, it will be a attribute on our constructed element (if we wanted it to be text, we could have applied the string function to it).
In this case, it returns
<large number="1">sec1</large>
<large number="2">sec2</large>
<large number="2.1">sec21</large>
The examples here were tested using the OP's provided XML (example.xml) using Saxon-HE 9.7.0.2J. Only the relevant parts appear above, but the complete first example ran looks like
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "text";
declare context item := doc("example.xml");
for $x in distinct-values(/report//section/title)
where count(/report//section[title=$x]) > 1
return $x
and the complete second example looks like
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "xml";
declare context item := doc("example.xml");
for $x in /report//section
where count($x/body/para) > 9 or count($x/section) > 4
return <large>{$x/@number} {string($x/title)}</large>
Upvotes: 4