adl
adl

Reputation: 1441

r - extracting nodes from xml file using xml2 while keeping original sequence of nodes

I have this xml file:

txt <- read_xml(
  "<messages>
    <mes>
     <element id=\"159\" error=\"info1\"/>
     <element id=\"183\">
      <text>text1</text>
     </element>
    </mes>
    <mes>
     <element id=\"159\" error=\"info2\"/>
     <element id=\"183\">
      <text>text2</text>
     </element>
    </mes>
    <mes>
     <element id=\"159\" error=\"info3\"/>
    </mes>
   </messages>"
)

I'm trying to extract all nodes with "element" while keeping the original order of the nodes from the XML file. Tried using the xml2 package:

> txt %>% xml2::xml_find_all("mes") %>% xml_find_all("element")
{xml_nodeset (5)}
[1] <element id="159" error="info1"/>
[2] <element id="183">\n  <text>text1</text>\n</element>
[3] <element id="159" error="info2"/>
[4] <element id="183">\n  <text>text2</text>\n</element>
[5] <element id="159" error="info3"/>

Here I get all nodes but I don't get the sequence from the file.

Finally I would like to get something like this:

data.frame(
  sequence = c(1, 1, 2, 2, 3),
  element_id = c(159, 183, 159, 183, 159),
  error = c("info1", "NA", "info2", "NA", "info3"),
  text = c("NA", "text1", "NA", "text2", "NA")
)

where sequence is the sequence of the node in the XML.

Is this possible ?

Upvotes: 1

Views: 310

Answers (1)

Dave2e
Dave2e

Reputation: 24079

One solution is to count the number of nodes in each "mes" node. From this series you then generate your desired sequence.

#create a vector of the number of element nodes in each mes node.
subnodes<-sapply(txt %>% xml2::xml_find_all("mes"), function(x){length(x %>% xml_find_all("element"))})

#create the desire sequence 
sequence<-rep(1:length(subnodes), times=subnodes)
sequence
#[1] 1 1 2 2 3

Upvotes: 1

Related Questions