Reputation: 2506
I am trying to extract all attributes (with the same name) within an xml file. Currently using the xml2
package and was hoping to have success with the xml_attr
or xml_attrs
functions.
library(xml2)
# basic xml file
x <- read_xml("<a>
<b><c>123</c></b>
<b><c>456</c></b>
</a>")
# add a few attributes with the same name of "Fake ID"
xml_set_attr(xml_child(x, 'b[1]'), 'FakeID', '11111')
xml_set_attr(xml_child(x, 'b[2]'), 'FakeID', '22222')
xml_set_attr(xml_child(xml_child(x, 'b[2]'), 'c'), 'FakeID', '33333')
# this will give me attributes only when I call a specific child node
xml_attr(xml_child(x, 'b[1]'), 'FakeID')
# this does not give me any attributes with the name "FakeID" because the current node
# doesn't have that attribute
xml_attr(x, 'FakeID')
What I am ultimately hoping for is a vector that gives the value of every node within the xml that has the attribute "FakeID"; c('11111', '22222', '33333')
Upvotes: 0
Views: 1136
Reputation: 8711
I used the package rvest
because it re-exports xml2
functions, but also re-exports the %>%
operator. Then I made your xml a string to be clear about what is in there and added a second attribute to your first node.
In xml_nodes()
I select all nodes with the *
css selector and specify I only want nodes having the FakeID attribute with [FakeID]
.
library(rvest)
"<a>
<b FakeID=\"11111\" RealID=\"abcde\">
<c>123</c>
</b>
<b FakeID=\"22222\">
<c FakeID=\"33333\">456</c>
</b>
</a>" %>%
read_xml() %>%
xml_nodes("*[FakeID]") %>%
xml_attrs() %>%
pluck("FakeID") %>%
unlist()
Upvotes: 1