burnersk
burnersk

Reputation: 3480

How to select an element which contains an specific subelement in XPath?

I have some MARC21-XML documents about books. I want to extract the names of the translators of the book.

Here is a snippet from one MARC21-XML document of a book:

<?xml version="1.0" encoding="UTF-8"?>
  <record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Wasel, Ulrike</subfield>
      <subfield code="4">trl</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Timmermann, Klaus</subfield>
      <subfield code="4">trl</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2="2">
      <subfield code="a">Eggers, Dave</subfield>
    </datafield>
  </record>

Dave Eggers is the author of the book and Klaus Timmermann and Ulrike Wasel helped translating the book.

In this scenario the following "simple" XPath 2.0 expression would work to extract the "translators":

/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']/subfield[@code='a']/text()

The result of this XPath 2.0 expression would be the following:

Text='Wasel, Ulrike'
Text='Timmermann, Klaus'

This seems to work nicely. However, I can think of a not-yet-discovered scenario in which there are additional elements with types other than translators (subfield[@code='a'] = 'trl'.

I would like to have the following selection logic implemented as XPath 2.0 but struggle to construct one:

To mockup the scenario:

<?xml version="1.0" encoding="UTF-8"?>
  <record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Wasel, Ulrike</subfield>
      <subfield code="4">trl</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Timmermann, Klaus</subfield>
      <subfield code="4">trl</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2=" ">
      <subfield code="a">Doe, John</subfield>
      <subfield code="4">oth</subfield>
    </datafield>
    <datafield tag="700" ind1="1" ind2="2">
      <subfield code="a">Eggers, Dave</subfield>
    </datafield>
  </record>

In this scenario the following "simple" XPath 2.0 expression would work to extract the "translators":

/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']/subfield[@code='a']/text()

The result of this XPath 2.0 expression would be the following:

Text='Wasel, Ulrike'
Text='Timmermann, Klaus'
Text='Doe, John'

And there is the error: John Doe is not a translator (trl) but some other (oth) contributor to the book. I do not want him ;)

I am not that familar to the MARC21-XML specification. The specifications about MARC21-XML which I have read are in a very strange tabular format that is hard to understand. It is possible that @ind1='1' and @ind2=' ' contains only translators but than the "type" field with "trl" makes no sense.

How to construct an XPath 2.0 expression that selects only the translators from the mockedup screnario?

Upvotes: 2

Views: 803

Answers (1)

kjhughes
kjhughes

Reputation: 111686

To further restrict this XPath,

/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']
       /subfield[@code='a']/text()

to select only those datafield elements whose subfield child element with code of 4 has a string value of "trl", add another predicate, [subfield[@code='4']='trl']:

/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']
                 [subfield[@code='4']='trl']
       /subfield[@code='a']/text()

Upvotes: 3

Related Questions