Jonathan Hill
Jonathan Hill

Reputation: 1843

How to update xml attributes without using a loop with the xml2 R package

I have an xml object that I would like to update using R's xml2 package. There are two things I generally need to do:

  1. Update text inside nodes <c>{text}</c>
  2. Update attributes of nodes <c name={text}/>

I'd like to avoid looping over the xml structure because that is significantly slower than identifying the node set and assigning an entire vector of values to it at once.

xml <- read_xml("<root>
        <c name='test' db_name='TEST'><d>This is the column desc</d></c>
        <c name='test2' db_name='TEST2'><d>This is the column desc</d></c>
        <c name='test3' db_name='TEST3'><d>This is the column desc</d></c>
    </root>")

df <- data.frame(
    db_name = c("TEST2", "TEST", "TEST3"), 
    desc = c("New desc!", "You want this desc", "GOOD VECTOR"),
    disp_name = c("OKAY", "NOW", "HAPPY"), stringsAsFactors = F)

We are GOOD on #1

c_nodes           <- xml %>% xml_find_all("//c")
c_db_names        <- c_nodes %>% xml_find_all("@db_name") %>% xml_text    
xml_text(c_nodes) <- df$desc[match(c_db_names, df$db_name)]

BAD on #2

disp_names <- df$disp_name[match(c_db_names, df$db_name)]

for (i in seq_along(c_nodes)) {
  xml_attr(c_nodes[i], "name") <- disp_names[i]
}

When I try xml_attr(c_nodes, "name") <- df$disp_name[match(c_db_names, df$db_name)], I get the following error:

Error in node_set_attr(x$node, name = attr, nsMap = ns, value) : expecting a single value

If I provide a single value it updates the entire node set with that one value, but I need to make different updates to each node attribute. Thus, I'm using a loop, but I want to replace it with a vectorized equivalent to produce this:

{xml_document}
<root>
[1] <c name="NOW" db_name="TEST">\n  <d>You want this desc</d>\n</c>
[2] <c name="OKAY" db_name="TEST2">\n  <d>New desc!</d>\n</c>
[3] <c name="HAPPY" db_name="TEST3">\n  <d>GOOD VECTOR</d>\n</c>

Upvotes: 1

Views: 1219

Answers (1)

Tom B
Tom B

Reputation: 36

xml_set_attrs is the correct function to use, but you must pass in a list of named character vectors for the value parameter. You can create this list using an apply function, and then pass it into the function as follows:

new_attrs<-lapply(df$disp_name[match(c_db_names, df$db_name)], 
                  function(x) {
                                names(x)<- "name"
                                x
                               })

xml_set_attrs(c_nodes, new_attrs)

Upvotes: 1

Related Questions