Louis Maddox
Louis Maddox

Reputation: 5566

CSS/Xpath sibling selector in Nokogiri

I have the following XML tree and need to get out the first name and surname only for the contrib tags with child xref nodes of ref-type "corresp".

<pmc-articleset>
 <article>
  <front>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Wereszczynski</surname>
            <given-names>Jeff</given-names>
          </name>
          <xref rid="aff1" ref-type="aff"/>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Andricioaei</surname>
            <given-names>Ioan</given-names>
          </name>
          <xref rid="aff1" ref-type="aff"/>
          <xref ref-type="corresp" rid="cor1">*</xref>
        </contrib>
      </contrib-group>
    </article-meta>
  </front>
</article>
</pmc-articleset>

I saw "Getting the siblings of a node with Nokogiri" which points out the CSS sibling selectors that can be used in Nokogiri, but, following the example given, my code gives siblings indiscriminately.

require "Net/http"
require "nokogiri"
    url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?id=PMC1637560&db=pmc"
    xml_data = Net::HTTP.get_response(URI.parse(url)).body
    parsedoc = Nokogiri::XML.parse(xml_data)
    corrdetails = parsedoc.at('contrib:has(xref[text()="*"])')
    puts surname = corrdetails.xpath( "//surname" ).text
    puts givennames = corrdetails.xpath("//given-names").text

=> WereszczynskiAndricioaei
=> JeffIoan

I only want the sibling node under the condition that <xref ref-type="corresp">*</> , that is an output of:

=> Andricioaei
=> Ioan

I've currently implemented this without referring to ref-type but rather selecting the asterisk within the xref tag (either is appropriate).

Upvotes: 0

Views: 1422

Answers (1)

Justin Ko
Justin Ko

Reputation: 46836

The problem is actually with your XPath for getting the the surname and given name, i.e., the XPath is incorrect for the lines:

puts surname = corrdetails.xpath( "//surname" ).text
puts givennames = corrdetails.xpath("//given-names").text

Starting the XPath with // means to look for the node anywhere in the document. You only want to look within the corrdetails node, which means the XPath needs to start with a dot, e.g., .//.

Change the two lines to:

puts surname = corrdetails.xpath( ".//surname" ).text
puts givennames = corrdetails.xpath(".//given-names").text

Upvotes: 2

Related Questions