Reputation: 3480
I have some MARC21-XML documents about books. I want to extract the names of the translators of the book.
Here is a snippet from one MARC21-XML document of a book:
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="a">Wasel, Ulrike</subfield>
<subfield code="4">trl</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="a">Timmermann, Klaus</subfield>
<subfield code="4">trl</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2="2">
<subfield code="a">Eggers, Dave</subfield>
</datafield>
</record>
Dave Eggers is the author of the book and Klaus Timmermann and Ulrike Wasel helped translating the book.
In this scenario the following "simple" XPath 2.0 expression would work to extract the "translators":
/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']/subfield[@code='a']/text()
The result of this XPath 2.0 expression would be the following:
Text='Wasel, Ulrike'
Text='Timmermann, Klaus'
This seems to work nicely. However, I can think of a not-yet-discovered scenario in which there are additional elements with types other than translators (subfield[@code='a'] = 'trl'
.
I would like to have the following selection logic implemented as XPath 2.0 but struggle to construct one:
/record/datafield
attribute tag
has value "700"/record/datafield
attribute ind1
has value "1"/record/datafield
attribute ind2
has value " "/record/datafield
contains subfield
with attribute code
equals "4" and its text()
is "trl"To mockup the scenario:
<?xml version="1.0" encoding="UTF-8"?>
<record xmlns="http://www.loc.gov/MARC21/slim" type="Bibliographic">
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="a">Wasel, Ulrike</subfield>
<subfield code="4">trl</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="a">Timmermann, Klaus</subfield>
<subfield code="4">trl</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="a">Doe, John</subfield>
<subfield code="4">oth</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2="2">
<subfield code="a">Eggers, Dave</subfield>
</datafield>
</record>
In this scenario the following "simple" XPath 2.0 expression would work to extract the "translators":
/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']/subfield[@code='a']/text()
The result of this XPath 2.0 expression would be the following:
Text='Wasel, Ulrike'
Text='Timmermann, Klaus'
Text='Doe, John'
And there is the error: John Doe is not a translator (trl
) but some other (oth
) contributor to the book. I do not want him ;)
I am not that familar to the MARC21-XML specification. The specifications about MARC21-XML which I have read are in a very strange tabular format that is hard to understand. It is possible that @ind1='1'
and @ind2=' '
contains only translators but than the "type" field with "trl" makes no sense.
How to construct an XPath 2.0 expression that selects only the translators from the mockedup screnario?
Upvotes: 2
Views: 803
Reputation: 111686
To further restrict this XPath,
/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']
/subfield[@code='a']/text()
to select only those datafield
elements whose subfield
child element with code
of 4
has a string value of "trl"
, add another predicate, [subfield[@code='4']='trl']
:
/record/datafield[@tag='700'][@ind1='1'][@ind2=' ']
[subfield[@code='4']='trl']
/subfield[@code='a']/text()
Upvotes: 3