Reputation: 7107
I have a data frame with some nested data and I would like to flatten the data and extract specific cells from the nested data.
The nested data is in the column MetaData
which looks like the following:
[[1]]
Id Variable.Id Variable.Nombre Variable.Codigo Nombre Codigo
1 72 3 Tipo de dato Dato base
2 5457 19 Municipios MUN Abrera 08001
3 274520 260 Conceptos Demográficos Edad media de la población
[[2]]
Id Variable.Id Variable.Nombre Variable.Codigo Nombre Codigo
1 72 3 Tipo de dato Dato base
2 366833 260 Conceptos Demográficos Porcentaje de hogares unipersonales
3 327739 846 Distritos DIST Badalona distrito 02 0801502
[[3]]
Id Variable.Id Variable.Nombre Variable.Codigo Nombre Codigo
1 72 3 Tipo de dato Dato base
2 366833 260 Conceptos Demográficos Porcentaje de hogares unipersonales
3 331103 847 Secciones SECC Santa Coloma de Gramenet sección 05009 0824505009
I want to extract -
From [[1]]
the municipios
, MUN
, Abrera
and 08001
From [[2]]
the Distritos
, DIST
, Badalona distrito 02
and 0801502
From [[3]]
the Secciones
, SECC
, Santa Columa de Gramenet sección 05009
and 0824505009
.
However, just using the cell location does not work since the location for the MUN
data is in a slightly different location for the DIST
and SECC
data - i.e. for MUN
under the column Nombre
and Codigo
the cell location is in row 2, whereas for the DIST
and SECC
data the cell location is in row 3.
I have the following code which can extract the data for the MUN
without problem.
data2 <- data %>%
mutate(MetaDataWider = map(MetaData, ~ {
v1 <- .x[cbind(c(2, 3, 2, 2, 3), c(3, 3, 4, 5, 5))]
names(v1) <- c("type", "contable", "type_code", "region", "variable")
as_tibble_row(v1)
})
)
I now want to add an ifelse
statement to the map
/map_if
in order to correctly extract the data for the DIST
and SECC
observations.
Alternatively, create the ifelse
statment to change the order of the DIST
and SECC
observations. i.e.
if
Variable.Codigo
containsDIST
|SECC
shift row 3 to row 2, else nothing.
Then I can use the code I already have to extract the data.
Data:
data <- structure(list(COD = c("ADRH7218704", "ADRH7013747", "ADRH6909920"
), Nombre = c("Abrera. Edad media de la población. Dato base. ",
"Badalona distrito 02. Porcentaje de hogares unipersonales. Dato base. ",
"Santa Coloma de Gramenet sección 05009. Porcentaje de hogares unipersonales. Dato base. "
), T3_Unidad = c("Años", "Porcentaje", "Porcentaje"), T3_Escala = c(" ",
" ", " "), MetaData = list(structure(list(Id = c(72L, 5457L,
274520L), Variable = structure(list(Id = c(3L, 19L, 260L), Nombre = c("Tipo de dato",
"Municipios", "Conceptos Demográficos"), Codigo = c("", "MUN",
"")), class = "data.frame", row.names = c(NA, 3L)), Nombre = c("Dato base",
"Abrera", "Edad media de la población"), Codigo = c("", "08001",
"")), class = "data.frame", row.names = c(NA, 3L)), structure(list(
Id = c(72L, 366833L, 327739L), Variable = structure(list(
Id = c(3L, 260L, 846L), Nombre = c("Tipo de dato", "Conceptos Demográficos",
"Distritos"), Codigo = c("", "", "DIST")), class = "data.frame", row.names = c(NA,
3L)), Nombre = c("Dato base", "Porcentaje de hogares unipersonales",
"Badalona distrito 02"), Codigo = c("", "", "0801502")), class = "data.frame", row.names = c(NA,
3L)), structure(list(Id = c(72L, 366833L, 331103L), Variable = structure(list(
Id = c(3L, 260L, 847L), Nombre = c("Tipo de dato", "Conceptos Demográficos",
"Secciones"), Codigo = c("", "", "SECC")), class = "data.frame", row.names = c(NA,
3L)), Nombre = c("Dato base", "Porcentaje de hogares unipersonales",
"Santa Coloma de Gramenet sección 05009"), Codigo = c("", "",
"0824505009")), class = "data.frame", row.names = c(NA, 3L))),
Data = list(structure(list(Fecha = c("2018-01-01T00:00:00.000+01:00",
"2017-01-01T00:00:00.000+01:00", "2016-01-01T00:00:00.000+01:00",
"2015-01-01T00:00:00.000+01:00"), T3_TipoDato = c("Definitivo",
"Definitivo", "Definitivo", "Definitivo"), T3_Periodo = c("A",
"A", "A", "A"), Anyo = 2018:2015, Valor = c(39.7, 39.5, 39.2,
38.8)), class = "data.frame", row.names = c(NA, 4L)), structure(list(
Fecha = c("2018-01-01T00:00:00.000+01:00", "2017-01-01T00:00:00.000+01:00",
"2016-01-01T00:00:00.000+01:00", "2015-01-01T00:00:00.000+01:00"
), T3_TipoDato = c("Definitivo", "Definitivo", "Definitivo",
"Definitivo"), T3_Periodo = c("A", "A", "A", "A"), Anyo = 2018:2015,
Valor = c(25.5, 25.7, 25.5, 25.8)), class = "data.frame", row.names = c(NA,
4L)), structure(list(Fecha = c("2018-01-01T00:00:00.000+01:00",
"2017-01-01T00:00:00.000+01:00", "2016-01-01T00:00:00.000+01:00",
"2015-01-01T00:00:00.000+01:00"), T3_TipoDato = c("Definitivo",
"Definitivo", "Definitivo", "Definitivo"), T3_Periodo = c("A",
"A", "A", "A"), Anyo = 2018:2015, Valor = c(24.1, 23.6, 22.2,
20.9)), class = "data.frame", row.names = c(NA, 4L)))), row.names = c(NA,
-3L), class = "data.frame")
Upvotes: 0
Views: 184
Reputation: 887118
We can use rbindlist
library(data.table)
rbindlist(lapply(data$MetaData, function(x) {
do.call(data.frame, subset(x, Variable$Codigo != ""))
}))
-output
Id Variable.Id Variable.Nombre Variable.Codigo Nombre Codigo
1: 5457 19 Municipios MUN Abrera 08001
2: 327739 846 Distritos DIST Badalona distrito 02 0801502
3: 331103 847 Secciones SECC Santa Coloma de Gramenet sección 05009 0824505009
Upvotes: 1
Reputation: 388982
How about using purrr::map_df
?
purrr::map_df(data$MetaData, ~ {
.x[.x$Variable$Codigo != '', ]
})
Upvotes: 1