Reputation: 151
I have a question about the parsing of XML files. Then I put my function and a sample XML file. My question is: In this file, I can parse the subnode "item" and subnode "tag" without problems, but when I try to parse the subnode prop, I get a single string with all the values together. The XML parsing function does not distinguish between them because they all have the same label "prop". I need subnode values are stored in separate columns within a data.frame, is there any way to do that?
My function:
PARSE_INVENTORY_items<-function(DF_DEVICE_IDE_value, URL_DEVICE_value){
require(XML)
require(RCurl)
host<-URL_DEVICE_value
device<-"/devices/"
ID_devices<-DF_DEVICE_IDE_value[1,1]
inventory<-"/inventory"
start_device<-"/start"
FULL_url<-paste(host, device, ID_devices, inventory, sep="")
FULL_url_start<-paste(host, device, ID_devices, start_device, sep="")
URL_inventory<-gsub(" ","", FULL_url, fixed=TRUE)
URL_start_device<-gsub(" ","", FULL_url_start, fixed=TRUE)
httpGET(URL_start_device)
XML_inventory_exists = url.exists(URL_inventory)
# Regular HTTP
if( XML_inventory_exists) {
inventory = getURL(URL_inventory)
inventory_xml <- xmlInternalTreeParse(inventory)
items <- getNodeSet(inventory_xml,"//data/inventory/items/item")
DataFrame_inventory_items <- xmlToDataFrame(items)
items_tags<-getNodeSet(inventory_xml, "//data/inventory/items/item/tags/tag")
DataFrame_inventory_tags_subnode <- xmlToDataFrame(items_tags)
#items_tags_props<-getNodeSet(inventory_xml, "//data/inventory/items/item/tags/tag/props/prop")
#DataFrame_inventory_props_subnode_tag <- xmlToDataFrame(items_tags_props)
DataFrame_inventory_items<-cbind(DataFrame_inventory_items,DataFrame_inventory_tags_subnode)
#aux<-DataFrame_inventory_items
#DataFrame_inventory_items<-subset(DataFrame_inventory_items, select=(-tags))
return(DataFrame_inventory_items)
}
}
Example of XML file
<?xml version="1.0" encoding="UTF-8"?>
<inventory>
<type>inventory</type>
<ts>1396964708000</ts>
<status>OK</status>
<msg-version>2.0.0</msg-version>
<op>inventory</op>
<data>
<advanNetId>AdvanNet-instance-00:26:b9:08:cd:e1-3161</advanNetId>
<deviceId>adrd1</deviceId>
<inventory>
<class>INVENTORY</class>
<deviceId>adrd1</deviceId>
<timeWindow>2500</timeWindow>
<items>
<item>
<class>READ_EVENT</class>
<epc>00000000000000000000A200</epc>
<ts>1396964708122</ts>
<deviceId>adrd1</deviceId>
<tags>
<tag>
<class>CONTEXT_TAG_DATA</class>
<hexepc>00000000000000000000A200</hexepc>
<props>
<prop>RF_PHASE:154</prop>
<prop>READ_COUNT:1</prop>
<prop>RSSI:-55</prop>
<prop>TIME_STAMP:1396964708122</prop>
<prop>ANTENNA_PORT:1</prop>
</props>
</tag>
</tags>
<tag-rssi>-55.0</tag-rssi>
<tag-readcount>1</tag-readcount>
<tag-phase>154.0</tag-phase>
</item>
</items>
</inventory>
</data>
</inventory>
Upvotes: 0
Views: 1021
Reputation: 59345
So your XML is still not well-formed (missing closing tag for <items>
, but close enough to be usable.
The code below creates a data frame from the contents of the <tags>
element, with 1 row for each <tag>
element, and with columns for <class>
, <hexepc>
and each of the <prop>
elements. The column names from the different <prop>
elements are parsed out of the text (so, RF_PHASE
, READ_COUNT
, etc.). Note that is works if each <tag>
has the same <props>
.
In this example, the xml you provided (corrected) is called xml.text
.
library(XML)
xml <- xmlInternalTreeParse(xml.text,useInternalNodes=T)
# add a few extra tag nodes - you have this already
tags <- xml["//data/inventory/items/item/tags"]
tag <- xml["//data/inventory/items/item/tags/tag"]
addChildren(node=tags[[1]],xmlClone(tag[[1]]))
addChildren(node=tags[[1]],xmlClone(tag[[1]]))
addChildren(node=tags[[1]],xmlClone(tag[[1]]))
# this is where you start
tags <- xml["//data/inventory/items/item/tags/tag"]
result <- do.call(rbind,lapply(tags,function(tag){
class <- xmlValue(tag["class"][[1]])
hexepc <- xmlValue(tag["hexepc"][[1]])
props <- sapply(tag["props"]$props["prop"],xmlValue)
props <- strsplit(props,":")
props <- setNames(sapply(props,function(x)x[2]),sapply(props,function(x)x[1]))
c(class=class,hexepc=hexepc,props)
}))
result <- data.frame(result)
# class hexepc RF_PHASE READ_COUNT RSSI TIME_STAMP ANTENNA_PORT
# 1 CONTEXT_TAG_DATA 00000000000000000000A200 154 1 -55 1396964708122 1
# 2 CONTEXT_TAG_DATA 00000000000000000000A200 154 1 -55 1396964708122 1
# 3 CONTEXT_TAG_DATA 00000000000000000000A200 154 1 -55 1396964708122 1
# 4 CONTEXT_TAG_DATA 00000000000000000000A200 154 1 -55 1396964708122 1
Upvotes: 1