Fenil Dedhia
Fenil Dedhia

Reputation: 365

How to use XQuery to extract specific XML records and output in comma delimited format?

I'm trying to extract only <xref> data along with their book IDs using a XQuery (I'm new to this).

Here is the input data:

  <book id="6636551">
    <master_information>
        <book_xref>
            <xref type="Fiction" type_id="1">72771KAM3</xref>
            <xref type="Non_Fiction" type_id="2">US72771KAM36</xref>
        </book_xref>
    </master_information>
    <book_details>
        <price>24.95</price>
        <publish_date>2000-10-01</publish_date>
        <description>An in-depth look at creating applications with XML.</description>
    </book_details>
  </book>
  <book id="119818569">
    <master_information>
        <book_xref>
            <xref type="Fiction" type_id="1">070185UL5</xref>
            <xref type="Non_Fiction" type_id="2">US070185UL50</xref>
        </book_xref>
    </master_information>
    <book_details>
        <price>19.25</price>
        <publish_date>2002-11-01</publish_date>
        <description>A former architect battles corporate zombies, 
  an evil sorceress, and her own childhood to become queen 
  of the world.</description>
    </book_details>
  </book>
  <book id="119818568">
  <master_information>
        <book_xref>
            <xref type="Fiction" type_id="1">070185UK7</xref>
            <xref type="Non_Fiction" type_id="2">US070185UK77</xref>
        </book_xref>
    </master_information>
    <book_details>
        <price>5.95</price>
        <publish_date>2004-05-01</publish_date>
        <description>After the collapse of a nanotechnology 
  society in England, the young survivors lay the 
  foundation for a new society.</description>
    </book_details>
  </book>
  <book id="119818567">
    <master_information>
        <book_xref>
            <xref type="Fiction" type_id="1">070185UJ0</xref>
            <xref type="Non_Fiction" type_id="2">US070185UJ05</xref>
        </book_xref>
    </master_information>
    <book_details>
        <price>4.95</price>
        <publish_date>2000-09-02</publish_date>
        <description>When Carla meets Paul at an ornithology 
  conference, tempers fly as feathers get ruffled.</description>
    </book_details>
  </book>


Expected output format 1:

  <book id="6636551">
    <master_information>
        <book_xref>
            <xref type="Fiction" type_id="1">72771KAM3</xref>
            <xref type="Non_Fiction" type_id="2">US72771KAM36</xref>
        </book_xref>
    </master_information>
  </book>

XQuery that I'm using for format 1:

  for$x in //book_xref/xref
  return $x

Question for format 1: I tried including book id separately so that it's included in the output but it doesn't match the expected format as I mentioned above. How do I get the book id to be fetched as well in the output per the format?


Expected output format 2 (comma delimited):

  book_id, xref_type, xref_type_id, xref
  6636551, Fiction, 1, 72771KAM3
  6636551, Non_Fiction, 2, US72771KAM36
  119818569, Fiction, 1, 070185UL5
  119818569, Non_Fiction, 2, US070185UL50
  etc.

Question for format 2: How can I get output in comma delimited format through XQuery? Do I need to stick to XSLT for that?

I appreciate your response.

Upvotes: 3

Views: 1196

Answers (2)

Joe Wicentowski
Joe Wicentowski

Reputation: 5294

XQuery is a great way to generate CSV files from your XML data, whether the source is a single XML document or a collection of XML documents stored on the filesystem or in an XML database. There are a number of approaches one might take in XQuery. For one that uses XQuery 3.1 array structures and serialization facility to corral your data into rows and cells, see the tutorial at https://github.com/CliffordAnderson/XQuery4Humanists/blob/master/05-Generating-JSON-and-CSV.md.

Upvotes: 2

Martin Honnen
Martin Honnen

Reputation: 167716

For the CSV you can use string-join i.e. for those four values you can use

//book//book_xref/xref/string-join((ancestor::book/@id, @type, @type_id, .), ',')

which would give a sequence of strings with the record data; if you want a single string with the header line and those data lines you can use another string-join:

string-join(('book_id,xref_type,xref_type_id,xref', //book//book_xref/xref/string-join((ancestor::book/@id, @type, @type_id, .), ',')), '&#10;')

For the transformation/XML extraction reconstruct the book elements with the xref descendants and add the master_information e.g.

//book[.//book_xref/xref]/<book id="{@id}">{master_information}</book>

Upvotes: 2

Related Questions