ESELIA
ESELIA

Reputation: 162

add multiple data frames together in a list

I feel like this should have a really simple/elegant solution but I just can't find it. (I'm relatively new to r so that's no surprise.)

I have a (large) nested list containing data.frames that I'm trying to add together. Here is code to create some sample data:

#Create data frames nested in a list
for (i in 1:6) {
  for (j in 1:4) {
    assign(paste0("v", j), sample.int(100,4))
  }
  assign(paste0("df", i), list(cbind(v1, v2, v3, v4)))
}

inner1 <- list(data1 = df1, data2 = df2)
inner2 <- list(data1 = df3, data2 = df4)
inner3 <- list(data1 = df5, data2 = df6)

outer <- list(group1 = inner1, group2 = inner2, group3 = inner3)

I need to add all the data frames labeled data1 together and all the data2's together. If they weren't in this nested list format, I'd do this:

data1.tot <- df1 + df3 + df5
data2.tot <- df2 + df4 + df6

Because they are in a list, I thought there might be an lapply solution and tried:

grp <- c("group1", "group2", "group3") #vector of groups to sum across
datas <- lapply(outer, "[[", "data1") #select "data1" from all groups
tot.datas <- lapply(datas[grp], "+") #to sum across selected data
#I know these last two steps can be combined into one but it helps me keep everything straight to separate them

But it returns Error in FUN(left): invalid argument to unary operator because I'm passing the list of datas as x.

I've also looked at other solutions like this one: Adding selected data frames together, from a list of data frames

But the nested structure of my data makes me unsure of how to translate that solution to my problem.

And just to note, the data I'm working with are GCHN Daily data, so the structure is not my design. Any help would be greatly appreciated.

UPDATE: I've partially figured out a fix using the suggestion of Reduce by @Parfait, but now I need to automate it. I'm working on a solution using a for loop because that gives me more control over the elements I'm accessing, but I'm open to other ideas. Here is the manual solution that works:

get.df <- function(x, y, z) {
# function to pull out the desired data.frame from the list
# x included as argument to make function applicable to my real data
  output <- x[[y]][[z]]
  output[[1]]
}

output1 <- get.df(x = outer, y = "group1", z = "data1")
output2 <- get.df(x = outer, y = "group2", z = "data1")
data1 <- list(output1, output2)
data1.tot <- Reduce(`+`, data1)

Using my sample data, I'd like to loop this over 2 data types ("data1" and "data2") and 3 groups ("group1", "group2", "group3"). I'm working on a for loop solution, but struggling with how to save output1 and output2 in a list. My loop looks like this right now:

dat <- c("data1", "data2")
grp <- c("group1", "group2", "group3")

for(i in 1:length(dat)) {
  for(j in 1:length(grp)) {
    assign(paste0("out", j), get.df(x = outer, y = grp[j], z = dat[i]))
  }
list(??? #clearly this is where I'm stuck!
}

Any suggestions either on the for loop problem, or for a better method?

Upvotes: 1

Views: 938

Answers (3)

Parfait
Parfait

Reputation: 107567

Consider Reduce which work off of lists. This higher order function is a compact way to run nested calls: ((df1 + df2) + df3) + ....

data1.tot <- Reduce(`+`, lapply(outer, "[[", "data1"))

data2.tot <- Reduce(`+`, lapply(outer, "[[", "data2"))

To demonstrate with random data

Data

set.seed(9262018)

dfList <- setNames(replicate(6, data.frame(NUM1=runif(50),
                                           NUM2=runif(50),
                                           NUM3=runif(50)), simplify = FALSE),
                   paste0("df", 1:6))

list2env(dfList, .GlobalEnv)

inner1 <- list(data1 = df1, data2 = df2)
inner2 <- list(data1 = df3, data2 = df4)
inner3 <- list(data1 = df5, data2 = df6)

outer <- list(group1 = inner1, group2 = inner2, group3 = inner3)

Output

data1.tot <- Reduce(`+`, lapply(outer, "[[", "data1"))
head(data1.tot, 10)
#         NUM1      NUM2      NUM3
# 1  2.0533870 1.3821609 1.0702992
# 2  2.6046584 1.7260646 1.9699774
# 3  2.2510810 1.6690353 1.4495476
# 4  1.7636879 1.2357098 1.9483906
# 5  1.0189969 2.1191041 1.7466040
# 6  1.3933982 0.7541027 1.0971724
# 7  1.8058803 2.4608417 0.7291335
# 8  1.0763517 1.2494739 1.0480818
# 9  0.7069873 1.5496575 1.2264486
# 10 0.9522526 2.1407523 1.2597422

data2.tot <- Reduce(`+`, lapply(outer, "[[", "data2"))
head(data2.tot, 10)    
#         NUM1      NUM2      NUM3
# 1  1.7568578 0.9322930 1.5579897
# 2  0.9455063 0.9211592 1.7067779
# 3  1.2698614 0.4623059 0.9426310
# 4  1.6791964 1.4304953 1.2435480
# 5  0.8088625 2.6107952 1.2308862
# 6  1.8202400 2.3511104 1.5676112
# 7  0.9765578 0.8870206 0.6725699
# 8  2.6448770 1.8931751 1.8188512
# 9  1.6114870 1.8632245 0.7452924
# 10 0.9710550 1.8367305 2.0994788

Equality Test

all.equal(data1.tot, df1 + df3 + df5)
# [1] TRUE
all.equal(data2.tot, df2 + df4 + df6)
# [1] TRUE

identical(data1.tot, df1 + df3 + df5)
# [1] TRUE
identical(data2.tot, df2 + df4 + df6)
# [1] TRUE

Upvotes: 1

Tino
Tino

Reputation: 2101

Is this what you want?

sapply(
  X = names(outer[[1]]),
  FUN = function(d) {
    Reduce(x = unlist(lapply(outer, "[[", d), recursive = F), f = "+")
  },
  simplify = F,
  USE.NAMES = T
)

Upvotes: 0

Joe
Joe

Reputation: 1768

Here is a solution that works fine if each inner list contains only a few data frames:

sum_df1 <- sum(unlist(lapply(outer, "[[", 1)))
sum_df2 <- sum(unlist(lapply(outer, "[[", 2)))

If each inner list contains e. g. 1000 data frames, use:

dfs <- seq(1 : 1000)
lapply(dfs, function(x) sum(unlist(lapply(outer, "[[", x))))

This will give you a list where each element is a sum of inner data frames.

Upvotes: 0

Related Questions