Extract correct value on xml node R

Question

I have the following xml structure

Where I have many different "zona_carga" values. I want to make a data frame like this one:

        fecha hora      pz  pz_ene pz_per pz_cng
1  2017-04-19    1 1063.91  968.54  95.37      0
2  2017-04-19    2  1000.3  909.27  91.03      0
3  2017-04-19    3  954.25  868.84  85.41      0
4  2017-04-19    4  939.86  855.47   84.4      0
5  2017-04-19    5  931.78  847.65  84.14      0
6  2017-04-19    6  955.97  870.63  85.34      0
7  2017-04-19    7 1025.11  943.27  81.83      0
8  2017-04-19    8 1111.06 1030.09  82.52  -1.55
9  2017-04-19    9 1422.03 1310.35 111.89  -0.21
10 2017-04-19   10 1497.15 1388.88 108.27      0
11 2017-04-19   11 1482.88 1381.24 101.65      0
12 2017-04-19   12 1483.35 1405.45   77.9      0
13 2017-04-19   13 1479.93 1405.95  73.98      0
14 2017-04-19   14 1427.65 1346.85   80.8      0

with an additional column showing which zona_carga each value corresponds. This is the code I'm using:

url = getURL("https://ws01.cenace.gob.mx:8082/SWPEND/SIM/SIN/MDA/ACAPULCO,AGUASCALIENTES/2017/04/19/2017/04/19/XML")
xmlfile <- xmlParse(url)
a <- xmlToDataFrame(getNodeSet(xmlfile,"//*/Valor"), collectNames=T)

Any suggestion? Thanks!

hrbrmstr · Accepted Answer

library(xml2)
library(httr)
library(tidyverse)

# Cross-platform, caching-download (be nice and don't waste bandwidth that isn't yours)
httr::GET(
  url = "https://ws01.cenace.gob.mx:8082/SWPEND/SIM/SIN/MDA/ACAPULCO,AGUASCALIENTES/2017/04/19/2017/04/19/XML",
  httr::write_disk("~/Data/ws01.xml"),
  httr::progress()
) -> res

doc <- read_xml("~/Data/ws01.xml")

# this has namesapces so it's a pain and we'll make it painless
doc <- xml_ns_strip(doc)

xml_find_all(doc, ".//Resultados/Zona_Carga") %>%  # get main sections
  map_df(~{
    zone <- xml_find_all(.x, ".//zona_carga") %>% xml_text() # pull out the zone
    xml_find_all(.x, ".//Valores/Valor") %>% # get all the sub-records
      map_df(~{
        ch <- xml_children(.x) # get all the kids
        as.list(set_names( # mash them together
          xml_text(ch),
          xml_name(ch)
        ))
      }) %>%
      mutate(zona_carga = zone) # add in the zone
  }) %>%
  readr::type_convert() # get better column types

Which produces:

## # A tibble: 48 x 7
##    fecha       hora    pz pz_ene pz_per pz_cng zona_carga
##                      
##  1 2017-04-19     1 1064.   969.   95.4   0    ACAPULCO  
##  2 2017-04-19     2 1000.   909.   91.0   0    ACAPULCO  
##  3 2017-04-19     3  954.   869.   85.4   0    ACAPULCO  
##  4 2017-04-19     4  940.   855.   84.4   0    ACAPULCO  
##  5 2017-04-19     5  932.   848.   84.1   0    ACAPULCO  
##  6 2017-04-19     6  956.   871.   85.3   0    ACAPULCO  
##  7 2017-04-19     7 1025.   943.   81.8   0    ACAPULCO  
##  8 2017-04-19     8 1111.  1030.   82.5  -1.55 ACAPULCO  
##  9 2017-04-19     9 1422.  1310.  112.   -0.21 ACAPULCO  
## 10 2017-04-19    10 1497.  1389.  108.    0    ACAPULCO  
## # ... with 38 more rows

Extract correct value on xml node R

Answers (1)

Related Questions