Reputation: 1441
I have this xml file:
txt <- read_xml(
"<messages>
<mes>
<element id=\"159\" error=\"info1\"/>
<element id=\"183\">
<text>text1</text>
</element>
</mes>
<mes>
<element id=\"159\" error=\"info2\"/>
<element id=\"183\">
<text>text2</text>
</element>
</mes>
<mes>
<element id=\"159\" error=\"info3\"/>
</mes>
</messages>"
)
I'm trying to extract all nodes with "element" while keeping the original order of the nodes from the XML file. Tried using the xml2
package:
> txt %>% xml2::xml_find_all("mes") %>% xml_find_all("element")
{xml_nodeset (5)}
[1] <element id="159" error="info1"/>
[2] <element id="183">\n <text>text1</text>\n</element>
[3] <element id="159" error="info2"/>
[4] <element id="183">\n <text>text2</text>\n</element>
[5] <element id="159" error="info3"/>
Here I get all nodes but I don't get the sequence from the file.
Finally I would like to get something like this:
data.frame(
sequence = c(1, 1, 2, 2, 3),
element_id = c(159, 183, 159, 183, 159),
error = c("info1", "NA", "info2", "NA", "info3"),
text = c("NA", "text1", "NA", "text2", "NA")
)
where sequence
is the sequence of the node in the XML.
Is this possible ?
Upvotes: 1
Views: 310
Reputation: 24079
One solution is to count the number of nodes in each "mes" node. From this series you then generate your desired sequence.
#create a vector of the number of element nodes in each mes node.
subnodes<-sapply(txt %>% xml2::xml_find_all("mes"), function(x){length(x %>% xml_find_all("element"))})
#create the desire sequence
sequence<-rep(1:length(subnodes), times=subnodes)
sequence
#[1] 1 1 2 2 3
Upvotes: 1