awenborn
awenborn

Reputation: 417

Conditional Retrieval of XML Nodes with XML2 package in R

I'm trying to get to grips with the xml2 package in retrieving and filtering XML nodes in R.

I have an XML file with structure...

...
 <entry>
  <feature type="x">123</feature>
  <feature type="y">456</feature>
  <feature type="y">789</feature>
 </entry>
...

...and I'm trying to retrieve just the first "feature" of type "y" in a single statement.

At the moment I'm doing this as follows:

# Return all <feature> nodes
xmlNodes <- xml_find_all(inputXml, ".//entry/feature")

# ...filter by type="y"...
xmlNodes <- xmlNodes[xml_attr(xmlNodes, "type")=="y"]

# ...and then return the first node
xmlNode <- xmlNodes[1]

Is there an easier way that I can achieve this in a single statement, perhaps using the xml_find_first() function with that "type" == "y" condition, assuming that the first feature node might not necessarily be "type" = "y"?

Maybe something like:

xmlNode <- xml_find_first(inputXml, ".//entry/feature" & xml_attr(inputXml, "type")=="chain")

I feel like this is a very simple question but I'm new to R and not quite familiar with all the syntax... many thanks!

Upvotes: 0

Views: 656

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173858

This is about xpath syntax, not R syntax. Your example isn't a valid xml document on its own to demonstrate, so I have expanded it a little:

xml <- '<?xml version="1.0"?>
<entries>
<entry>
    <feature type="x">123</feature>
    <feature type="y">456</feature>
    <feature type="y">789</feature>
</entry>
<entry>
    <feature type="x">12</feature>
    <feature type="y">13</feature>
    <feature type="y">14</feature>
</entry>
</entries>'

If I understand you correctly, you want the first feature of type = "y" in each entry, so in my example this would be the nodes containing text "456" and "13". In that case, the correct xpath expression is "//feature[@type = 'y'][1]".

So you would get the correct nodes with:

xml2::read_xml(xml) %>% xml2::xml_find_all("//feature[@type = 'y'][1]")
#> {xml_nodeset (2)}
#> [1] <feature type="y">456</feature>
#> [2] <feature type="y">13</feature>

Upvotes: 3

Related Questions