Shreyas Karnik
Shreyas Karnik

Reputation: 4133

Selecting specific XML nodes in R?

I am using XML package in R to parse a XML file that has the following structure.

 <document id="Something" origId="Text">
    <sentence id="Something" origId="thisorig" text="Blah Blah.">
    <special id="id.s0.i0" origId="1" e1="en1" e2="en2" type="" directed="True"/>
    </sentence>
     <sentence id="Something" origId="thisorig" text="Blah Blah.">
      </sentence>
</document>

I want to select the nodes having </special> tag in them in one variable and the nodes without the </special> tag in other variable.

Is it possible to do it with R any pointers/answers will be very helpful.

Upvotes: 3

Views: 12178

Answers (2)

Richie Cotton
Richie Cotton

Reputation: 121077

Parse the xml tree, use xpath to specify the location of the nodes.

doc <- xmlTreeParse("test.xml", useInternalNodes = TRUE)
special_nodes <- getNodeSet(doc, "/document//special")

Upvotes: 2

Dieter Menne
Dieter Menne

Reputation: 10215

I added a few more cases to test for exceptions:

<document id="Something" origId="Text">
    <sentence id="Something" origId="thisorig" text="Blah Blah.">
    <special id="id.s0.i0" origId="1" e1="en1" e2="en2" type="" directed="True"/>
    </sentence>
    <sentence id="Else" origId="thatorig" text="Blu Blu.">
      <special id="id.s0.i1" origId="1" e1="en1" e2="en2" type="" directed="True"/>
    </sentence>
     <sentence id="Something" origId="thisorig" text="Blah Blah.">
       <notso id = "hallo" />
      </sentence>
     <sentence id="Something no sentence" origId="thisOther" text="Blah Blah.">
      </sentence>
</document>

library(XML)
doc = xmlInternalTreeParse("sentence.xml")
hasSentence = xpathApply(doc, "//sentence/special/..")
xpathApply(doc, "/document/sentence[not(child::special)]")

Upvotes: 7

Related Questions