Fenil Dedhia
Fenil Dedhia

Reputation: 365

How to use XQuery to extract specific XML records based on multiple conditions in comma delimited format?

Input file:

<?xml version="1.0" encoding="UTF-8"?> 
        <books>
            <book id="6636551">
                <master_information>
                    <book_xref>
                        <xref type="Fiction" type_id="1">72771KAM3</xref>
                        <xref type="Non_Fiction" type_id="2">US72771KAM36</xref>
                    </book_xref>
                </master_information>
                <book_details>
                    <price>24.95</price>
                    <publish_date>2000-10-01</publish_date>
                    <description>An in-depth look at creating applications with XML.</description>
                </book_details>
                <global_information>
                    <ratings>
                        <rating agency="ABC Agency" type="Author Rating">A++</rating>
                        <rating agency="DEF Agency" type="Author Rating">A+</rating>
                        <rating agency="DEF Agency" type="Book Rating">A</rating>
                    </ratings>
                </global_information>
                <country_info>
                    <country_code>US</country_code>
                </country_info>
            </book>
            <book id="119818569">
                <master_information>
                    <book_xref>
                        <xref type="Fiction" type_id="1">070185UL5</xref>
                        <xref type="Non_Fiction" type_id="2">US070185UL50</xref>
                    </book_xref>
                </master_information>
                <book_details>
                    <price>19.25</price>
                    <publish_date>2002-11-01</publish_date>
                    <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
                </book_details>
                <global_information>
                    <ratings>
                        <rating agency="ABC Agency" type="Author Rating">A+</rating>
                        <rating agency="ABC Agency" type="Book Rating">A</rating>
                        <rating agency="DEF Agency" type="Author Rating">A</rating>
                        <rating agency="DEF Agency" type="Book Rating">B+</rating>
                    </ratings>
                </global_information>
                <country_info>
                    <country_code>CA</country_code>
                </country_info>
            </book>
            <book id="119818568">
                <master_information>
                    <book_xref>
                        <xref type="Fiction" type_id="1">070185UK7</xref>
                        <xref type="Non_Fiction" type_id="2">US070185UK77</xref>
                    </book_xref>
                </master_information>
                <book_details>
                    <price>5.95</price>
                    <publish_date>2004-05-01</publish_date>
                    <description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description>
                </book_details>
                <global_information>
                    <ratings>
                        <rating agency="ABC Agency" type="Author Rating">A+</rating>
                        <rating agency="ABC Agency" type="Book Rating">A+</rating>
                        <rating agency="DEF Agency" type="Author Rating">B++</rating>
                        <rating agency="DEF Agency" type="Book Rating">A+</rating>
                    </ratings>
                </global_information>
                <country_info>
                    <country_code>UK</country_code>
                </country_info>
            </book>
            <book id="119818567">
                <master_information>
                    <book_xref>
                        <xref type="Fiction" type_id="1">070185UJ0</xref>
                        <xref type="Non_Fiction" type_id="2">US070185UJ05</xref>
                    </book_xref>
                </master_information>
                <book_details>
                    <price>4.95</price>
                    <publish_date>2000-09-02</publish_date>
                    <description>When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled.</description>
                </book_details>
                <global_information>
                    <ratings>
                        <rating agency="ABC Agency" type="Author Rating">B+</rating>
                        <rating agency="ABC Agency" type="Book Rating">A+</rating>
                        <rating agency="DEF Agency" type="Author Rating">B++</rating>
                        <rating agency="DEF Agency" type="Book Rating">A+</rating>
                    </ratings>
                </global_information>
                <country_info>
                    <country_code>US</country_code>
                </country_info>
            </book>
        </books>

I wrote a XQuery to fetch specific XML records in a CSV format:

for $x in string-join(('book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating', //book//global_information/ratings/rating[@type='Author Rating' and .=('A+','B++')]/string-join((ancestor::book/@id, ancestor::book//book_xref/xref/@type, ancestor::book//book_xref/xref, ancestor::book//country_info/country_code, ancestor::book//book_details/description, @agency, @type, .), ',')), '&#10;')
return $x

Expected output is:

book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating
6636551,Fiction,72771KAM3,US,An in-depth look at creating applications with XML.,DEF Agency,Author Rating,A+
6636551,Non_Fiction,US72771KAM36,US,An in-depth look at creating applications with XML.,DEF Agency,Author Rating,A+
119818569,Fiction,070185UL5,CA,A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.,ABC Agency,Author Rating,A+
119818569,Non_Fiction,US070185UL50,CA,A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.,ABC Agency,Author Rating,A+
etc.

where I expected book_ids to be repeated since I didn't filter on xref_type so it should show them both in separate lines but it doesn't. The output generated is as follows:

book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating
6636551,Fiction,Non_Fiction,72771KAM3,US72771KAM36,US,An in-depth look at creating applications with XML.,DEF Agency,Author Rating,A+
119818569,Fiction,Non_Fiction,070185UL5,US070185UL50,CA,A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.,ABC Agency,Author Rating,A+

When I filter on the xref_type="Fiction" I have to do it on both xref/@type and xref. It works but is there a better way to do this?

Essentially, my question can be broken down into three small questions:

  1. How can I have "Fiction" and "Non-Fiction" items listed on separate lines for the same given book_id?
  2. Is there a better way to write the conditions for this code?
  3. How can I write a condition in a way that outputs blank values when data is absent for a specific condition? Is this done automatically?

I appreciate your help!

Upvotes: 1

Views: 175

Answers (1)

Leo W&#246;rteler
Leo W&#246;rteler

Reputation: 4241

At the point at which the XPath expression ancestor::book//book_xref/xref/@type is evaluated, the context is a single rating element and the result is a sequence of type="..." attributes. So when you assemble your CSV line, all types of book references of the ancestor book element are treated as one value (which can be a sequence of items).

If you want to separate each single item into its separate CSV line, you have to recursively iterate over the sequence using e.g. a for loop. I bind the book to a variable using let and the loop over all xrefs:

string-join(
  (
    'book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating',
    for $rating in //book//global_information/ratings/rating[@type='Author Rating' and .=('A+','B++')]
    let $book := $rating/ancestor::book
    for $xref in $book//xref
    return (
      string-join(
        (
          $book/@id,
          $xref/@type,
          $xref/text(),
          $book//country_info/country_code,
          $book//book_details/description/text(),
          $rating/@agency,
          $rating/@type,
          $rating/text()
        ),
        ','
      )
    )
  ),
  '&#10;'
)

If you want to avoid backward axes (which can sometimes be a bit awkward), you can also iterate over books first and then over the ratings you are interested in:

string-join(
  (
    'book_id, xref_type, xref, country, desc, rating_agency, rating_type, rating',
    for $book in //book
    for $rating in $book/global_information/ratings/rating[@type='Author Rating' and .=('A+','B++')]
    for $xref in $book//xref
    [...]

Upvotes: 1

Related Questions