Reputation: 365
I'm trying to extract only <xref>
data along with their book IDs using a XQuery (I'm new to this).
Here is the input data:
<book id="6636551">
<master_information>
<book_xref>
<xref type="Fiction" type_id="1">72771KAM3</xref>
<xref type="Non_Fiction" type_id="2">US72771KAM36</xref>
</book_xref>
</master_information>
<book_details>
<price>24.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book_details>
</book>
<book id="119818569">
<master_information>
<book_xref>
<xref type="Fiction" type_id="1">070185UL5</xref>
<xref type="Non_Fiction" type_id="2">US070185UL50</xref>
</book_xref>
</master_information>
<book_details>
<price>19.25</price>
<publish_date>2002-11-01</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book_details>
</book>
<book id="119818568">
<master_information>
<book_xref>
<xref type="Fiction" type_id="1">070185UK7</xref>
<xref type="Non_Fiction" type_id="2">US070185UK77</xref>
</book_xref>
</master_information>
<book_details>
<price>5.95</price>
<publish_date>2004-05-01</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book_details>
</book>
<book id="119818567">
<master_information>
<book_xref>
<xref type="Fiction" type_id="1">070185UJ0</xref>
<xref type="Non_Fiction" type_id="2">US070185UJ05</xref>
</book_xref>
</master_information>
<book_details>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology
conference, tempers fly as feathers get ruffled.</description>
</book_details>
</book>
Expected output format 1:
<book id="6636551">
<master_information>
<book_xref>
<xref type="Fiction" type_id="1">72771KAM3</xref>
<xref type="Non_Fiction" type_id="2">US72771KAM36</xref>
</book_xref>
</master_information>
</book>
XQuery that I'm using for format 1:
for$x in //book_xref/xref
return $x
Question for format 1: I tried including book id separately so that it's included in the output but it doesn't match the expected format as I mentioned above. How do I get the book id to be fetched as well in the output per the format?
Expected output format 2 (comma delimited):
book_id, xref_type, xref_type_id, xref
6636551, Fiction, 1, 72771KAM3
6636551, Non_Fiction, 2, US72771KAM36
119818569, Fiction, 1, 070185UL5
119818569, Non_Fiction, 2, US070185UL50
etc.
Question for format 2: How can I get output in comma delimited format through XQuery? Do I need to stick to XSLT for that?
I appreciate your response.
Upvotes: 3
Views: 1196
Reputation: 5294
XQuery is a great way to generate CSV files from your XML data, whether the source is a single XML document or a collection of XML documents stored on the filesystem or in an XML database. There are a number of approaches one might take in XQuery. For one that uses XQuery 3.1 array structures and serialization facility to corral your data into rows and cells, see the tutorial at https://github.com/CliffordAnderson/XQuery4Humanists/blob/master/05-Generating-JSON-and-CSV.md.
Upvotes: 2
Reputation: 167716
For the CSV you can use string-join
i.e. for those four values you can use
//book//book_xref/xref/string-join((ancestor::book/@id, @type, @type_id, .), ',')
which would give a sequence of strings with the record data; if you want a single string with the header line and those data lines you can use another string-join:
string-join(('book_id,xref_type,xref_type_id,xref', //book//book_xref/xref/string-join((ancestor::book/@id, @type, @type_id, .), ',')), ' ')
For the transformation/XML extraction reconstruct the book
elements with the xref
descendants and add the master_information
e.g.
//book[.//book_xref/xref]/<book id="{@id}">{master_information}</book>
Upvotes: 2