Reputation: 162
I feel like this should have a really simple/elegant solution but I just can't find it. (I'm relatively new to r so that's no surprise.)
I have a (large) nested list containing data.frames that I'm trying to add together. Here is code to create some sample data:
#Create data frames nested in a list
for (i in 1:6) {
for (j in 1:4) {
assign(paste0("v", j), sample.int(100,4))
}
assign(paste0("df", i), list(cbind(v1, v2, v3, v4)))
}
inner1 <- list(data1 = df1, data2 = df2)
inner2 <- list(data1 = df3, data2 = df4)
inner3 <- list(data1 = df5, data2 = df6)
outer <- list(group1 = inner1, group2 = inner2, group3 = inner3)
I need to add all the data frames labeled data1
together and all the data2
's together. If they weren't in this nested list format, I'd do this:
data1.tot <- df1 + df3 + df5
data2.tot <- df2 + df4 + df6
Because they are in a list, I thought there might be an lapply
solution and tried:
grp <- c("group1", "group2", "group3") #vector of groups to sum across
datas <- lapply(outer, "[[", "data1") #select "data1" from all groups
tot.datas <- lapply(datas[grp], "+") #to sum across selected data
#I know these last two steps can be combined into one but it helps me keep everything straight to separate them
But it returns Error in FUN(left): invalid argument to unary operator
because I'm passing the list of datas as x
.
I've also looked at other solutions like this one: Adding selected data frames together, from a list of data frames
But the nested structure of my data makes me unsure of how to translate that solution to my problem.
And just to note, the data I'm working with are GCHN Daily data, so the structure is not my design. Any help would be greatly appreciated.
UPDATE:
I've partially figured out a fix using the suggestion of Reduce
by @Parfait, but now I need to automate it. I'm working on a solution using a for
loop because that gives me more control over the elements I'm accessing, but I'm open to other ideas. Here is the manual solution that works:
get.df <- function(x, y, z) {
# function to pull out the desired data.frame from the list
# x included as argument to make function applicable to my real data
output <- x[[y]][[z]]
output[[1]]
}
output1 <- get.df(x = outer, y = "group1", z = "data1")
output2 <- get.df(x = outer, y = "group2", z = "data1")
data1 <- list(output1, output2)
data1.tot <- Reduce(`+`, data1)
Using my sample data, I'd like to loop this over 2 data types ("data1" and "data2") and 3 groups ("group1", "group2", "group3"). I'm working on a for
loop solution, but struggling with how to save output1
and output2
in a list. My loop looks like this right now:
dat <- c("data1", "data2")
grp <- c("group1", "group2", "group3")
for(i in 1:length(dat)) {
for(j in 1:length(grp)) {
assign(paste0("out", j), get.df(x = outer, y = grp[j], z = dat[i]))
}
list(??? #clearly this is where I'm stuck!
}
Any suggestions either on the for
loop problem, or for a better method?
Upvotes: 1
Views: 938
Reputation: 107567
Consider Reduce
which work off of lists. This higher order function is a compact way to run nested calls: ((df1 + df2) + df3) + ...
.
data1.tot <- Reduce(`+`, lapply(outer, "[[", "data1"))
data2.tot <- Reduce(`+`, lapply(outer, "[[", "data2"))
To demonstrate with random data
Data
set.seed(9262018)
dfList <- setNames(replicate(6, data.frame(NUM1=runif(50),
NUM2=runif(50),
NUM3=runif(50)), simplify = FALSE),
paste0("df", 1:6))
list2env(dfList, .GlobalEnv)
inner1 <- list(data1 = df1, data2 = df2)
inner2 <- list(data1 = df3, data2 = df4)
inner3 <- list(data1 = df5, data2 = df6)
outer <- list(group1 = inner1, group2 = inner2, group3 = inner3)
Output
data1.tot <- Reduce(`+`, lapply(outer, "[[", "data1"))
head(data1.tot, 10)
# NUM1 NUM2 NUM3
# 1 2.0533870 1.3821609 1.0702992
# 2 2.6046584 1.7260646 1.9699774
# 3 2.2510810 1.6690353 1.4495476
# 4 1.7636879 1.2357098 1.9483906
# 5 1.0189969 2.1191041 1.7466040
# 6 1.3933982 0.7541027 1.0971724
# 7 1.8058803 2.4608417 0.7291335
# 8 1.0763517 1.2494739 1.0480818
# 9 0.7069873 1.5496575 1.2264486
# 10 0.9522526 2.1407523 1.2597422
data2.tot <- Reduce(`+`, lapply(outer, "[[", "data2"))
head(data2.tot, 10)
# NUM1 NUM2 NUM3
# 1 1.7568578 0.9322930 1.5579897
# 2 0.9455063 0.9211592 1.7067779
# 3 1.2698614 0.4623059 0.9426310
# 4 1.6791964 1.4304953 1.2435480
# 5 0.8088625 2.6107952 1.2308862
# 6 1.8202400 2.3511104 1.5676112
# 7 0.9765578 0.8870206 0.6725699
# 8 2.6448770 1.8931751 1.8188512
# 9 1.6114870 1.8632245 0.7452924
# 10 0.9710550 1.8367305 2.0994788
Equality Test
all.equal(data1.tot, df1 + df3 + df5)
# [1] TRUE
all.equal(data2.tot, df2 + df4 + df6)
# [1] TRUE
identical(data1.tot, df1 + df3 + df5)
# [1] TRUE
identical(data2.tot, df2 + df4 + df6)
# [1] TRUE
Upvotes: 1
Reputation: 2101
Is this what you want?
sapply(
X = names(outer[[1]]),
FUN = function(d) {
Reduce(x = unlist(lapply(outer, "[[", d), recursive = F), f = "+")
},
simplify = F,
USE.NAMES = T
)
Upvotes: 0
Reputation: 1768
Here is a solution that works fine if each inner list contains only a few data frames:
sum_df1 <- sum(unlist(lapply(outer, "[[", 1)))
sum_df2 <- sum(unlist(lapply(outer, "[[", 2)))
If each inner list contains e. g. 1000 data frames, use:
dfs <- seq(1 : 1000)
lapply(dfs, function(x) sum(unlist(lapply(outer, "[[", x))))
This will give you a list where each element is a sum of inner data frames.
Upvotes: 0