Reputation: 6488
I have a following XML file
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<t:Forecast xmlns:t="http://example.com">
<Sender Abbreviation="abc" Name="xyz"/>
<Recipient Abbreviation="efg" Name="cba"/>
<createdUTC>2017-11-24T10:41:11Z</createdUTC>
<MessageID>bcjs</MessageID>
<SystemState>test</SystemState>
<ForecastData>
<DataHeader GroupKey="rkolo">
<Timeseries ID="abc123">
<TimeInt ISTUTC="2017-11-24T10:45:00Z" Out="858"/>
<TimeInt ISTUTC="2017-11-24T11:45:00Z" Out="868"/>
</Timeseries>
<Timeseries ID="xyz">
<TimeInt ISTUTC="2017-11-24T10:45:00Z" Out="870"/>
<TimeInt ISTUTC="2017-11-24T11:45:00Z" Out="890"/>
</Timeseries>
</ForecastData>
</t:Forecast>
I would like to have two data frames
TimeInt out
2017-11-24T10:45:00Z 858
2017-11-24T11:45:00Z 868
and another data frame as
TimeInt out
2017-11-24T10:45:00Z 870
2017-11-24T11:45:00Z 890
So far, I have done as follows:
require(XML)
temp = xmlParse("datafile.xml")
data = xmlToList(temp)
But the output of data
contains lot of nested lists. How do I get the data frames?
Edit 1: changed out
values
Upvotes: 0
Views: 56
Reputation: 107587
Consider the triple-colon method xmlAttrsToDataFrame
but looping through each node index of Timeseries, even name each element with the corresponding Timeseries id.
library(XML)
txt='<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<t:Forecast xmlns:t="http://example.com">
<Sender Abbreviation="abc" Name="xyz"/>
<Recipient Abbreviation="efg" Name="cba"/>
<createdUTC>2017-11-24T10:41:11Z</createdUTC>
<MessageID>bcjs</MessageID>
<SystemState>test</SystemState>
<ForecastData>
<DataHeader GroupKey="rkolo"/>
<Timeseries ID="abc123">
<TimeInt ISTUTC="2017-11-24T10:45:00Z" Out="858"/>
<TimeInt ISTUTC="2017-11-24T11:45:00Z" Out="858"/>
</Timeseries>
<Timeseries ID="xyz">
<TimeInt ISTUTC="2017-11-24T10:45:00Z" Out="870"/>
<TimeInt ISTUTC="2017-11-24T11:45:00Z" Out="870"/>
</Timeseries>
</ForecastData>
</t:Forecast>'
doc <- xmlParse(txt)
dfList <- lapply(1:length(xpathSApply(doc, "//Timeseries", xmlAttrs)), function(i)
XML:::xmlAttrsToDataFrame(getNodeSet(doc, path=paste0('//Timeseries[',i,']/TimeInt')))
)
dfList <- setNames(dfList, xpathSApply(doc, path='//Timeseries', xmlAttrs))
dfList
Output
dfList$abc123
# ISTUTC Out
# 1 2017-11-24T10:45:00Z 858
# 2 2017-11-24T11:45:00Z 858
dfList$xyz
# ISTUTC Out
# 3 2017-11-24T10:45:00Z 870
# 4 2017-11-24T11:45:00Z 870
Upvotes: 1