Reputation: 417
I'm trying to get to grips with the xml2 package in retrieving and filtering XML nodes in R.
I have an XML file with structure...
...
<entry>
<feature type="x">123</feature>
<feature type="y">456</feature>
<feature type="y">789</feature>
</entry>
...
...and I'm trying to retrieve just the first "feature" of type "y" in a single statement.
At the moment I'm doing this as follows:
# Return all <feature> nodes
xmlNodes <- xml_find_all(inputXml, ".//entry/feature")
# ...filter by type="y"...
xmlNodes <- xmlNodes[xml_attr(xmlNodes, "type")=="y"]
# ...and then return the first node
xmlNode <- xmlNodes[1]
Is there an easier way that I can achieve this in a single statement, perhaps using the xml_find_first() function with that "type" == "y" condition, assuming that the first feature node might not necessarily be "type" = "y"?
Maybe something like:
xmlNode <- xml_find_first(inputXml, ".//entry/feature" & xml_attr(inputXml, "type")=="chain")
I feel like this is a very simple question but I'm new to R and not quite familiar with all the syntax... many thanks!
Upvotes: 0
Views: 656
Reputation: 173858
This is about xpath syntax, not R syntax. Your example isn't a valid xml document on its own to demonstrate, so I have expanded it a little:
xml <- '<?xml version="1.0"?>
<entries>
<entry>
<feature type="x">123</feature>
<feature type="y">456</feature>
<feature type="y">789</feature>
</entry>
<entry>
<feature type="x">12</feature>
<feature type="y">13</feature>
<feature type="y">14</feature>
</entry>
</entries>'
If I understand you correctly, you want the first feature
of type = "y"
in each entry
, so in my example this would be the nodes containing text "456" and "13". In that case, the correct xpath expression is "//feature[@type = 'y'][1]"
.
So you would get the correct nodes with:
xml2::read_xml(xml) %>% xml2::xml_find_all("//feature[@type = 'y'][1]")
#> {xml_nodeset (2)}
#> [1] <feature type="y">456</feature>
#> [2] <feature type="y">13</feature>
Upvotes: 3