Extract attributes with same name for all nodes in an xml file using R

Question

I am trying to extract all attributes (with the same name) within an xml file. Currently using the xml2 package and was hoping to have success with the xml_attr or xml_attrs functions.

library(xml2)

# basic xml file
x <- read_xml("
  123
  456
")

# add a few attributes with the same name of "Fake ID"
xml_set_attr(xml_child(x, 'b[1]'), 'FakeID', '11111')
xml_set_attr(xml_child(x, 'b[2]'), 'FakeID', '22222')
xml_set_attr(xml_child(xml_child(x, 'b[2]'), 'c'), 'FakeID', '33333')

# this will give me attributes only when I call a specific child node
xml_attr(xml_child(x, 'b[1]'), 'FakeID')
# this does not give me any attributes with the name "FakeID" because the current node
#   doesn't have that attribute
xml_attr(x, 'FakeID')

What I am ultimately hoping for is a vector that gives the value of every node within the xml that has the attribute "FakeID"; c('11111', '22222', '33333')

the-mad-statter · Accepted Answer

I used the package rvest because it re-exports xml2 functions, but also re-exports the %>% operator. Then I made your xml a string to be clear about what is in there and added a second attribute to your first node.

In xml_nodes() I select all nodes with the * css selector and specify I only want nodes having the FakeID attribute with [FakeID].

library(rvest)

"
   
     123
   
   
     456
   
" %>% 
  read_xml() %>% 
  xml_nodes("*[FakeID]") %>% 
  xml_attrs() %>% 
  pluck("FakeID") %>% 
  unlist()

Extract attributes with same name for all nodes in an xml file using R

Answers (1)

Related Questions