Sum the columns in a list of dataframes

Question

I want to generate a single dataframe from the list of dataframes in which all columns are same except one column 'Income' which should be sum of the all the 'Income' in the list.

Here is my list of dataframe

mylist= structure(list(`1` = structure(list(ID = c(36L, 37L, 38L, 39L), Income = c(0, 0, 0, 9100)), .Names = c("ID", "Income"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame"), `2` = structure(list(ID = c(36L, 37L, 38L, 39L), Income = c(0, 0, 0, 0)), .Names = c("ID", "Income"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame"), `3` = structure(list(ID = c(36L, 37L, 38L, 39L), Income = c(7360, 0, 0, 0)), .Names = c("ID", "Income"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame")))

> mylist 
$`1`
  ID Income
1 36      0
2 37      0
3 38      0
4 39   9100

$`2`
  ID Income
1 36      0
2 37      0
3 38      0
4 39      0

$`3`
  ID Income
1 36   7360
2 37      0
3 38      0
4 39      0

This is what I want to do :

   ID   Income
34 36     7360
26 37        0
23 38        0
15 39     9100

I have tried to use reduce() to do the sum but it creates a separate column which I want to avoid :

Reduce(function(df1, df2) data.frame(df1[,], res=df1["Income"] + df2["Income"]),mylist)

akrun · Accepted Answer

If the 'ID's in different list can be different, we merge the datasets using Reduce, and then do the rowSums of the output except the first one to create the 'Income' column.

r1 <- Reduce(function(...) merge(..., by = "ID"), mylist) 
data.frame(r1[1], Income = rowSums(r1[-1]))
#   ID Income
#1 36   7360
#2 37      0
#3 38      0
#4 39   9100

If the 'IDs' are the same and in the same order for all the datasets in the list, we create the data.frame by extracting the 'ID' from the first element of 'mylist' and get the sum of the 'Income' using Reduce with +.

data.frame(mylist[[1]][1], Reduce(`+`, lapply(mylist, `[`, 'Income')))

Sum the columns in a list of dataframes

Answers (2)

Related Questions