markthekoala
markthekoala

Reputation: 1085

Converting XMLNodeSet into a well formed XML document

I am trying to extract some information from a web site using R's XML library.

I have downloaded a web page. I then extract some relevant elements from the page using an Xpath expression. Typically this results in about 50 of these relevant elements. I then want to save these relevant items (an XMLNodeSet) as an XML document (so I can analyse the results in an XML editor).

But. Before, I can save the XMLNodeSet, I need to convert these into a well-formed xml document before using the XML::saveXML() function.

Does anybody have any ideas how to do this using R's XML package. The following is a code snippet:

download.file("https://www.holidayhouses.co.nz/Browse/List.aspx?page=37", "data.html")
doc <- htmlParse("data.html")
# set up x-path
str_x_path_lccg <- "//div[@class = 'ListCard-content group']"
# extract relevant nodes
xml_relevant_nodes <- XML::getNodeSet(doc, str_x_path_lccg)
# need to convert xml_relevant_nodes into a well-formed xml document in order to save it
# therefore the following fails
XML::saveXML(xml_relevant_nodes, "test.xml")

Any ideas...?

Upvotes: 3

Views: 707

Answers (1)

markthekoala
markthekoala

Reputation: 1085

Since asking the question, I have learnt a bit more about R's XML package. Here is the answer to the question originally asked:

download.file("https://www.holidayhouses.co.nz/Browse/List.aspx?page=37", "data.html")
doc <- htmlParse("data.html")
# set up x-path
str_x_path_lccg <- "//div[@class = 'ListCard-content group']"
# extract relevant nodes
xml_relevant_nodes <- XML::getNodeSet(doc, str_x_path_lccg)
# need to convert xml_relevant_nodes into a well-formed xml document in order to save it
# firstly, create a single node which will be the parent
xmlDoc = newXMLNode("top", "topNode", namespace = c(tfm = "http://www.thefactmachine.com"))
# now we can add the node set to the parent node
addChildren(xmlDoc, kids = xml_relevant_nodes)
XML::saveXML(xmlDoc, "test.xml")

Upvotes: 2

Related Questions