Reputation: 151
I'm starting a project in R language and I have to parse an XML, I'm using the XML library and functions xmlToDataFrame, XMLPARSE, etc.. I want to store the information in a structured way on a dataframe but I've encountered a problem. I can not get variables to take within a node separately, each in its appropriate column. By using the above-mentioned functions, it saves all the data of the variables in the dataframe a single cell in a single line.
The XML I use is as follows:
<?xml version="1.0" encoding="UTF-8"?>
-<rest-response>
<type>rest-response</type>
<time-stamp>1392217780000</time-stamp>
<status>OK</status>
<msg-version>1.0.0</msg-version>
<op>inventory</op>
-<response>
<inventorySize>3</inventorySize>
<inventoryMode>SYNCHRONOUS</inventoryMode>
<time>4952</time>
-<items>
-<item>
<epc>00000000000000000000A195</epc>
<ts>1392217779060</ts>
<location-id>adtr</location-id>
<location-pos>0,0,0</location-pos>
<device-id>adtr@1</device-id>
<device-reader>192.168.1.224</device-reader>
<device-readerPort>1</device-readerPort>
<device-readerMuxPort>0</device-readerMuxPort>
<device-readerMuxPort2>0</device-readerMuxPort2>
<tag-rssi>-49.0</tag-rssi>
<tag-readcount>36.0</tag-readcount>
<tag-phase>168.0</tag-phase>
</item>
-<item>
<epc>00000000000000000000A263</epc>
<ts>1392217779065</ts>
<location-id>adtr</location-id>
<location-pos>0,0,0</location-pos>
<device-id>adtr@1</device-id>
<device-reader>192.168.1.224</device-reader>
<device-readerPort>1</device-readerPort>
<device-readerMuxPort>0</device-readerMuxPort>
<device-readerMuxPort2>0</device-readerMuxPort2>
<tag-rssi>-49.0</tag-rssi>
<tag-readcount>36.0</tag-readcount>
<tag-phase>0.0</tag-phase>
</item>
-<item>
<epc>B00000000000001101080802</epc>
<ts>1392217779323</ts>
<location-id>adtr</location-id>
<location-pos>0,0,0</location-pos>
<device-id>adtr@1</device-id>
<device-reader>192.168.1.224</device-reader>
<device-readerPort>1</device-readerPort>
<device-readerMuxPort>0</device-readerMuxPort>
<device-readerMuxPort2>0</device-readerMuxPort2>
<tag-rssi>-72.0</tag-rssi>
<tag-readcount>27.0</tag-readcount>
<tag-phase>157.0</tag-phase>
</item>
</items>
</response>
</rest-response>
Everything is inside item gets it as a single value, and I want to put asunder by different concepts.
Another important point is that the XML may change, but its structure will always be the same, but there may be more items
Any idea?
Upvotes: 1
Views: 413
Reputation: 59355
So I assume to want the <items>
in a data frame. Assuming your xml is in the variable xml.text
, this will work:
library(XML)
xml <- xmlInternalTreeParse(xml.text) # assumes your xml in variable xml.text
items <- getNodeSet(xml,"//items/item")
df <- xmlToDataFrame(items)
df
# epc ts location-id location-pos device-id device-reader device-readerPort device-readerMuxPort device-readerMuxPort2 tag-rssi tag-readcount tag-phase
# 1 00000000000000000000A195 1392217779060 adtr 0,0,0 adtr@1 192.168.1.224 1 0 0 -49.0 36.0 168.0
# 2 00000000000000000000A263 1392217779065 adtr 0,0,0 adtr@1 192.168.1.224 1 0 0 -49.0 36.0 0.0
# 3 B00000000000001101080802 1392217779323 adtr 0,0,0 adtr@1 192.168.1.224 1 0 0 -72.0 27.0 157.0
I also assumed that you displayed this xml in a browser and cut/paste (which would explain the -<tag>
). Otherwise, your xml is not well-formed.
Upvotes: 2