Henry Navarro
Henry Navarro

Reputation: 953

How to parse a list of multiple list and convert it to data frame in R?

I am sending a request to a URL that returns a XML.

I parse the XML but then I want to convert the information to data frame.

The main problem is the information is a list of multiple list and I don't know how to convert it to data frame.

This is my code:

if(!require("httr")){install.packages("httr",quiet=T);library("httr")}
if(!require("XML")){install.packages("XML",quiet=T);library("XML")}
url<-"http://ovc.catastro.meh.es/ovcservweb/OVCSWLocalizacionRC/OVCCallejero.asmx/Consulta_DNPLOC"
    
    parameters_request<-list(Provincia="Madrid",
                             Municipio="Madrid",
                             Sigla="PS",
                             Calle="Delicias",
                             Numero="81",
                             Bloque="",
                             Escalera="",
                             Planta="",
                             Puerta="")
    
    
    result_request<-POST(url, body= parameters_request,encode="form")
    
    
    content_request <- content(result_request)
    
    content_request<-xmlToList(xmlParse(content_request))

Anyone know how to parse it?

Thanks in advance.

Upvotes: 1

Views: 211

Answers (1)

r2evans
r2evans

Reputation: 160607

We can use data.frame to flatten each item into a row, then do.call(rbind, ...) to combine them all. Unfortunately, this relies on all rows having the same names.

do.call(rbind, lapply(content_request[[2]][32:33], data.frame, stringsAsFactors = FALSE))
# Error in rbind(deparse.level, ...) : 
#   numbers of columns of arguments do not match

To fix that, there are manual workarounds, or we can use one of data.table::rbindlist or dplyr::bind_rows, both are more flexible when dealing with different sets of names. Both options below give the same output (classes notwithstanding).

head(
  dplyr::bind_rows(lapply(content_request[[2]], data.frame, stringsAsFactors = FALSE))
  # data.table::rbindlist(lapply(content_request[[2]], data.frame, stringsAsFactors = FALSE), fill  =TRUE)
)
head( ) )
#    rc.pc1  rc.pc2 rc.car rc.cc1 rc.cc2 dt.loine.cp dt.loine.cm dt.cmc  dt.np  dt.nm
# 1 1222924 VK4712A   0001      M      B          28          79    900 MADRID MADRID
# 2 1222924 VK4712A   0002      Q      Z          28          79    900 MADRID MADRID
# 3 1222924 VK4712A   0003      W      X          28          79    900 MADRID MADRID
# 4 1222924 VK4712A   0004      E      M          28          79    900 MADRID MADRID
# 5 1222924 VK4712A   0005      R      Q          28          79    900 MADRID MADRID
# 6 1222924 VK4712A   0006      T      W          28          79    900 MADRID MADRID
#   dt.locs.lous.lourb.dir.cv dt.locs.lous.lourb.dir.tv dt.locs.lous.lourb.dir.nv
# 1                      1675                        PS                  DELICIAS
# 2                      1675                        PS                  DELICIAS
# 3                      1675                        PS                  DELICIAS
# 4                      1675                        PS                  DELICIAS
# 5                      1675                        PS                  DELICIAS
# 6                      1675                        PS                  DELICIAS
#   dt.locs.lous.lourb.dir.pnp dt.locs.lous.lourb.dir.snp dt.locs.lous.lourb.loint.es
# 1                         81                          0                           1
# 2                         81                          0                           1
# 3                         81                          0                           1
# 4                         81                          0                           1
# 5                         81                          0                           1
# 6                         81                          0                           1
#   dt.locs.lous.lourb.loint.pt dt.locs.lous.lourb.loint.pu dt.locs.lous.lourb.dp
# 1                          -1                           A                 28045
# 2                          -1                           B                 28045
# 3                          -1                           C                 28045
# 4                          00                          01                 28045
# 5                          00                          02                 28045
# 6                          00                          03                 28045
#   dt.locs.lous.lourb.dm dt.locs.lous.lourb.dir.td
# 1                     2                      <NA>
# 2                     2                      <NA>
# 3                     2                      <NA>
# 4                     2                      <NA>
# 5                     2                      <NA>
# 6                     2                      <NA>

Upvotes: 1

Related Questions