Reputation: 40618
I have a list with the following example structure:
> dput(test)
structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(
var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2",
"var3")), section2 = structure(list(row = structure(list(var1 = 1,
var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")),
row = structure(list(var1 = 4, var2 = 5, var3 = 6), .Names = c("var1",
"var2", "var3")), row = structure(list(var1 = 7, var2 = 8,
var3 = 9), .Names = c("var1", "var2", "var3"))), .Names = c("row",
"row", "row"))), .Names = c("id", "var1", "var3", "section1",
"section2"))
> str(test)
List of 5
$ id : num 1
$ var1 : num 2
$ var3 : num 4
$ section1:List of 3
..$ var1: num 1
..$ var2: num 2
..$ var3: num 3
$ section2:List of 3
..$ row:List of 3
.. ..$ var1: num 1
.. ..$ var2: num 2
.. ..$ var3: num 3
..$ row:List of 3
.. ..$ var1: num 4
.. ..$ var2: num 5
.. ..$ var3: num 6
..$ row:List of 3
.. ..$ var1: num 7
.. ..$ var2: num 8
.. ..$ var3: num 9
Notice that the section2
list contains elements named rows
. These represent multiple records. What I have is a nested list where some elements are at the root level and others are multiple nested records for the same observation. I would like the following output in a data.frame
format:
> desired
id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
1 1 2 4 1 2 3 1 4 7
2 NA NA NA NA NA NA 2 5 8
3 NA NA NA NA NA NA 3 6 9
Root-level elements should populate the first row, while row
elements should have their own rows. As an added complication, the number of variables in the row
entries can vary.
Upvotes: 5
Views: 1917
Reputation: 49448
This starts similarly to tiffany's answer, but diverges a bit afterwards.
library(data.table)
# flatten the first level
flat = unlist(test, recursive = FALSE)
# compute max length
N = max(sapply(flat, length))
# pad NA's and convert to data.table (at this point it will *look* like the right answer)
dt = as.data.table(lapply(flat, function(l) c(l, rep(NA, N - length(l)))))
# but in reality some of the columns are lists - check by running sapply(dt, class)
# so unlist them
dt = dt[, lapply(.SD, unlist)]
# id var1 var3 section1.var1 section1.var2 section1.var3 section2.row section2.row section2.row
#1: 1 2 4 1 2 3 1 4 7
#2: NA NA NA NA NA NA 2 5 8
#3: NA NA NA NA NA NA 3 6 9
Upvotes: 0
Reputation: 503
Here's a general approach. It doesn't assume that you'll have only three row; it will work with however many rows you have. And if a value is missing in the nested structure (e.g. var1 doesn't exist for some sub-lists in section2), the code correctly returns an NA for that cell.
E.g. if we use the following data:
test <- structure(list(id = 1, var1 = 2, var3 = 4, section1 = structure(list(var1 = 1, var2 = 2, var3 = 3), .Names = c("var1", "var2", "var3")), section2 = structure(list(row = structure(list(var1 = 1, var2 = 2), .Names = c("var1", "var2")), row = structure(list(var1 = 4, var2 = 5), .Names = c("var1", "var2")), row = structure(list( var2 = 8, var3 = 9), .Names = c("var2", "var3"))), .Names = c("row", "row", "row"))), .Names = c("id", "var1", "var3", "section1", "section2"))
The general approach is to use melt to create a dataframe that includes information about the nested structure, and then dcast to mold it into the format you desire.
library("reshape2")
flat <- unlist(test, recursive=FALSE)
names(flat)[grep("row", names(flat))] <- gsub("row", "var", paste0(names(flat)[grep("row", names(flat))], seq_len(length(names(flat)[grep("row", names(flat))])))) ## keeps track of rows by adding an ID
ul <- melt(unlist(flat))
split <- strsplit(rownames(ul), split=".", fixed=TRUE) ## splits the names into component parts
max <- max(unlist(lapply(split, FUN=length)))
pad <- function(a) {
c(a, rep(NA, max-length(a)))
}
levels <- matrix(unlist(lapply(split, FUN=pad)), ncol=max, byrow=TRUE)
## Get the nesting structure
nested <- data.frame(levels, ul)
nested$X3[is.na(nested$X3)] <- levels(as.factor(nested$X3))[[1]]
desired <- dcast(nested, X3~X1 + X2)
names(desired) <- gsub("_", "\\.", gsub("_NA", "", names(desired)))
desired <- desired[,names(flat)]
> desired
## id var1 var3 section1.var1 section1.var2 section1.var3 section2.var1 section2.var2 section2.var3
## 1 1 2 4 1 2 3 1 4 7
## 2 NA NA NA NA NA NA 2 5 8
## 3 NA NA NA NA NA NA 3 6 9
Upvotes: 4
Reputation: 10167
Since your problem is not well defined when rows have complex
structures (i.e. if each row in test
contained the list test`, how should rows be bound together. Also what if rows in the same table have different structures?), the following solution depends on rows being a list of values.
That said, I'm guessing that in the general case, your list test
will
contain either values, lists of values, or lists of rows (where rows are
lists of values). Also, if rows aren't always called "row" this solution still works.
temp <- lapply(test,
function(x){
if(!is.list(x))
# x is a value
return(x)
# x is a lis of rows or values
out <- do.call(cbind,x)
if(nrow(out)>1){
# x is a list of rows
colnames(out)<-paste0(colnames(out),'.',rownames(out))
rownames(out)<-rep_len(NA,nrow(out))
}
return(out)
})
# a function that extends a matrix to a fixt number of rows (n)
# by appending rows of NA's
rowExtend <- function(x,N){
if((!is.matrix(x)) ){
out<-do.call(rbind,c(list(x),as.list(rep_len(NA,N - 1))))
colnames(out) <- ""
out
}else if(nrow(x) < N)
do.call(rbind,c(list(x),as.list(rep_len(NA,N - nrow(x)))))
else
x
}
# calculate the maximum number of rows
.nrows <- sapply(temp,nrow)
.nrows <- max(unlist(.nrows[!sapply(.nrows,is.null)]))
# extend the shorter rows
(temp2<-lapply(temp, rowExtend,.nrows))
# calculate new column namames
newColNames <- mapply(function(x,y) {
if(nzchar(y)[1L])
paste0(x,'.',y)
else x
},
names(temp2),
lapply(temp2,colnames))
do.call(cbind,mapply(`colnames<-`,temp2,newColNames))
#> id var1 var3 section1.var1 section1.var2 section1.var3 section2.row.var1 section2.row.var2 section2.row.var3
#> 1 2 4 1 2 3 1 4 7
#> NA NA NA NA NA NA 2 5 8
#> NA NA NA NA NA NA 3 6 9
Upvotes: 0
Reputation: 13304
The central idea of this solution is to flatten all sub-lists except the sub-lists named 'row'. This could be done by creating a unique ID for each list element (stored in z
) and then requesting that all elements within a single 'row' should have the same ID (stored in z2
; had to write a recursive function to traverse the nested list). Then, z2
could be used to group elements that belong to the same row. The resulting list can be converted into the matrix form using stri_list2matrix
from the stringi
package, and then converted into a data frame.
utest <- unlist(test)
z <- relist(seq_along(utest),test)
recurse <- function(L) {
if (class(L)!='list') return(L)
b <- names(L)=='row'
L.b <- lapply(L[b],function(k) relist(rep(k[[1]],length(k)),k))
L.nb <- lapply(L[!b],recurse)
c(L.b,L.nb)
}
z2 <- unlist(recurse(z))
library(stringi)
desired <- as.data.frame(stri_list2matrix(split(utest,z2)))
names(desired) <- names(z2)[unique(z2)]
desired
# id var1 var3 section1.var1 section1.var2 section1.var3 section2.row.var1
# 1 1 2 4 1 2 3 1
# 2 <NA> <NA> <NA> <NA> <NA> <NA> 2
# 3 <NA> <NA> <NA> <NA> <NA> <NA> 3
# section2.row.var1 section2.row.var1
# 1 4 7
# 2 5 8
# 3 6 9
Upvotes: 1