Reputation: 11
I'm trying to unpack a dataframe with columns that contain sub dataframes in each row.
The problem is, that the sub dataframes in each column have different sizes (e.g. 1x3, 2x3 and 2x2). Moreover, I have a column in a sub dataframe (Conversions.Value) that has different data formats in each row (num and char). During the unpacking process, I get error messages like 'can't recycle input of size 3 to size 2.' or 'Can't combine ..1$Conversions$Value
and ..6$Conversions$Value
.'
Structure below
structure(list
(Conversions = list(structure(list(Field = "Volume",
Unit = "m3", Value = 338L), class = "data.frame", row.names = 1L),
structure(list(Field = "Volume", Unit = "m3", Value = 450L), class = "data.frame", row.names = 1L)),
Categories = list(structure(list(CategorySystem = c("Base",
NA), Title = c("Mineral materials and glass (excluding concrete)", "213.7 Kevytbetoni, Aerated concrete"), ClassificationType = c(NA,
"Talo2000")), class = "data.frame", row.names = 1:2), structure(list(
CategorySystem = c("Base", NA), Title = c("Mineral materials and glass (excluding concrete)",
"213.7 Kevytbetoni, Aerated concrete"), ClassificationType = c(NA,
"Talo2000")), class = "data.frame", row.names = 1:2)),
DataItems.DataValueItems = list(structure(list(DataModuleCode = c("A1-A3 Conservative", "A1-A3 Typical"), Value = c(0.43, 0.36)), class = "data.frame", row.names = 1:2),
structure(list(DataModuleCode = c("A1-A3 Conservative",
"A1-A3 Typical"), Value = c(0.41, 0.34)), class = "data.frame", row.names = 1:2)),
ResourceId = c(7000000995, 7000000996)), row.names = 1:2, class = "data.frame")
So far I've tried:
unnest_wider(df, col = 1:3, names_repair = "universal")
# WORKED BUT multiple observations as a list in one row
# but different lengths
unnest_longer(df, col = 1:3, names_repair = "universal") %>%
mutate(across(.fns = as.character)) %>%
type_convert()
# ERROR Can't combine `..1$Conversions$Value` <integer> and `..6$Conversions$Value` <character>.
df$Conversions=lapply(df$Conversions, FUN=as.character)
unnest_longer(df, col = 1:3, names_repair = "universal") %>%
mutate(across(.fns = as.character)) %>%
type_convert()
#ERROR ! In row 1, can't recycle input of size 3 to size 2.
ideally, this is how the outcome would look like
EDIT
rbindlist
worked, but only when applied on each column separately. Thus I lose the primary identificator of each row (ResourceId) and the data is not rejoinable anymore.
rbindlist(lapply(df$Conversions, as.data.frame.list), fill=TRUE)
rbindlist(lapply(df$Categories, as.data.frame.list), fill=TRUE)
rbindlist(lapply(df$DataItems.DataValueItems, as.data.frame.list), fill=TRUE)
How do I paste the Resource Id into the dataframe structure of each column, so that when rbindlist is applied after, I get a result with a column containing the respective ResourceId values?
Upvotes: 0
Views: 66
Reputation: 509
So this is hideous I know but since nobody else has answered yet I figured I put it since I think this is what you wanted? let me know:
library(data.table)
df <- structure(list
(Conversions = list(structure(list(Field = "Volume",
Unit = "m3", Value = 338L), class = "data.frame", row.names = 1L),
structure(list(Field = "Volume", Unit = "m3", Value = 450L), class = "data.frame", row.names = 1L)),
Categories = list(structure(list(CategorySystem = c("Base",
NA), Title = c("Mineral materials and glass (excluding concrete)", "213.7 Kevytbetoni, Aerated concrete"), ClassificationType = c(NA,
"Talo2000")), class = "data.frame", row.names = 1:2), structure(list(
CategorySystem = c("Base", NA), Title = c("Mineral materials and glass (excluding concrete)",
"213.7 Kevytbetoni, Aerated concrete"), ClassificationType = c(NA,
"Talo2000")), class = "data.frame", row.names = 1:2)),
DataItems.DataValueItems = list(structure(list(DataModuleCode = c("A1-A3 Conservative", "A1-A3 Typical"), Value = c(0.43, 0.36)), class = "data.frame", row.names = 1:2),
structure(list(DataModuleCode = c("A1-A3 Conservative",
"A1-A3 Typical"), Value = c(0.41, 0.34)), class = "data.frame", row.names = 1:2)),
ResourceId = c(7000000995, 7000000996)), row.names = 1:2, class = "data.frame")
unlisted <- list()
for (i in 1:length(df)){
unlisted[[i]] <- rbindlist(lapply(df[i], as.data.frame.list), fill=TRUE)
}
cbind_new_list <- as.data.frame(do.call(cbind, unlisted))
removed_duplicates <- cbind_new_list[!duplicated(as.list(cbind_new_list))]
removed_duplicates
> removed_duplicates
Field Unit Value Value.1 CategorySystem Title ClassificationType DataModuleCode Value.2 Value.1.1 X7000000995 X7000000996
1 Volume m3 338 450 Base Mineral materials and glass (excluding concrete) <NA> A1-A3 Conservative 0.43 0.41 7000000995 7000000996
2 Volume m3 338 450 <NA> 213.7 Kevytbetoni, Aerated concrete Talo2000 A1-A3 Typical 0.36 0.34 7000000995 7000000996
Upvotes: 1