Reputation: 79
I have the following xml structure
<Resultados>
<Zona_Carga>
<zona_carga>ACAPULCO</zona_carga>
<Valores>
<Valor>
<fecha>2017-04-19</fecha>
<hora>1</hora>
<pz>1063.91</pz>
<pz_ene>968.54</pz_ene>
<pz_per>95.37</pz_per>
<pz_cng>0</pz_cng>
</Valor>
<Valor>
<fecha>2017-04-19</fecha>
<hora>2</hora>
<pz>1000.3</pz>
<pz_ene>909.27</pz_ene>
<pz_per>91.03</pz_per>
<pz_cng>0</pz_cng>
</Valor> ...
Where I have many different "zona_carga" values. I want to make a data frame like this one:
fecha hora pz pz_ene pz_per pz_cng
1 2017-04-19 1 1063.91 968.54 95.37 0
2 2017-04-19 2 1000.3 909.27 91.03 0
3 2017-04-19 3 954.25 868.84 85.41 0
4 2017-04-19 4 939.86 855.47 84.4 0
5 2017-04-19 5 931.78 847.65 84.14 0
6 2017-04-19 6 955.97 870.63 85.34 0
7 2017-04-19 7 1025.11 943.27 81.83 0
8 2017-04-19 8 1111.06 1030.09 82.52 -1.55
9 2017-04-19 9 1422.03 1310.35 111.89 -0.21
10 2017-04-19 10 1497.15 1388.88 108.27 0
11 2017-04-19 11 1482.88 1381.24 101.65 0
12 2017-04-19 12 1483.35 1405.45 77.9 0
13 2017-04-19 13 1479.93 1405.95 73.98 0
14 2017-04-19 14 1427.65 1346.85 80.8 0
with an additional column showing which zona_carga each value corresponds. This is the code I'm using:
url = getURL("https://ws01.cenace.gob.mx:8082/SWPEND/SIM/SIN/MDA/ACAPULCO,AGUASCALIENTES/2017/04/19/2017/04/19/XML")
xmlfile <- xmlParse(url)
a <- xmlToDataFrame(getNodeSet(xmlfile,"//*/Valor"), collectNames=T)
Any suggestion? Thanks!
Upvotes: 0
Views: 42
Reputation: 78792
library(xml2)
library(httr)
library(tidyverse)
# Cross-platform, caching-download (be nice and don't waste bandwidth that isn't yours)
httr::GET(
url = "https://ws01.cenace.gob.mx:8082/SWPEND/SIM/SIN/MDA/ACAPULCO,AGUASCALIENTES/2017/04/19/2017/04/19/XML",
httr::write_disk("~/Data/ws01.xml"),
httr::progress()
) -> res
doc <- read_xml("~/Data/ws01.xml")
# this has namesapces so it's a pain and we'll make it painless
doc <- xml_ns_strip(doc)
xml_find_all(doc, ".//Resultados/Zona_Carga") %>% # get main sections
map_df(~{
zone <- xml_find_all(.x, ".//zona_carga") %>% xml_text() # pull out the zone
xml_find_all(.x, ".//Valores/Valor") %>% # get all the sub-records
map_df(~{
ch <- xml_children(.x) # get all the kids
as.list(set_names( # mash them together
xml_text(ch),
xml_name(ch)
))
}) %>%
mutate(zona_carga = zone) # add in the zone
}) %>%
readr::type_convert() # get better column types
Which produces:
## # A tibble: 48 x 7
## fecha hora pz pz_ene pz_per pz_cng zona_carga
## <date> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 2017-04-19 1 1064. 969. 95.4 0 ACAPULCO
## 2 2017-04-19 2 1000. 909. 91.0 0 ACAPULCO
## 3 2017-04-19 3 954. 869. 85.4 0 ACAPULCO
## 4 2017-04-19 4 940. 855. 84.4 0 ACAPULCO
## 5 2017-04-19 5 932. 848. 84.1 0 ACAPULCO
## 6 2017-04-19 6 956. 871. 85.3 0 ACAPULCO
## 7 2017-04-19 7 1025. 943. 81.8 0 ACAPULCO
## 8 2017-04-19 8 1111. 1030. 82.5 -1.55 ACAPULCO
## 9 2017-04-19 9 1422. 1310. 112. -0.21 ACAPULCO
## 10 2017-04-19 10 1497. 1389. 108. 0 ACAPULCO
## # ... with 38 more rows
Upvotes: 1