Android Beginner
Android Beginner

Reputation: 486

Sum the columns in a list of dataframes

I want to generate a single dataframe from the list of dataframes in which all columns are same except one column 'Income' which should be sum of the all the 'Income' in the list.

Here is my list of dataframe

mylist= structure(list(`1` = structure(list(ID = c(36L, 37L, 38L, 39L), Income = c(0, 0, 0, 9100)), .Names = c("ID", "Income"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame"), `2` = structure(list(ID = c(36L, 37L, 38L, 39L), Income = c(0, 0, 0, 0)), .Names = c("ID", "Income"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame"), `3` = structure(list(ID = c(36L, 37L, 38L, 39L), Income = c(7360, 0, 0, 0)), .Names = c("ID", "Income"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame")))

> mylist 
$`1`
  ID Income
1 36      0
2 37      0
3 38      0
4 39   9100

$`2`
  ID Income
1 36      0
2 37      0
3 38      0
4 39      0

$`3`
  ID Income
1 36   7360
2 37      0
3 38      0
4 39      0

This is what I want to do :

   ID   Income
34 36     7360
26 37        0
23 38        0
15 39     9100

I have tried to use reduce() to do the sum but it creates a separate column which I want to avoid :

Reduce(function(df1, df2) data.frame(df1[,], res=df1["Income"] + df2["Income"]),mylist)

Upvotes: 4

Views: 1112

Answers (2)

Sergio
Sergio

Reputation: 763

Why don you do rbind in your list of dataframes and then aggregate?

system.time(teste <- do.call("rbind",mylist))
#   user  system elapsed 
#  0.004   0.000   0.000 

system.time(r1 <- Reduce(function(...) merge(..., by = "ID"), mylist))
#   user  system elapsed 
#  0.004   0.000   0.002 
aggregate(teste$Income, by=list(teste$ID), sum)
#  Group.1    x
#1      36 7360
#2      37    0
#3      38    0
#4      39 9100

Upvotes: 0

akrun
akrun

Reputation: 887971

If the 'ID's in different list can be different, we merge the datasets using Reduce, and then do the rowSums of the output except the first one to create the 'Income' column.

r1 <- Reduce(function(...) merge(..., by = "ID"), mylist) 
data.frame(r1[1], Income = rowSums(r1[-1]))
#   ID Income
#1 36   7360
#2 37      0
#3 38      0
#4 39   9100

If the 'IDs' are the same and in the same order for all the datasets in the list, we create the data.frame by extracting the 'ID' from the first element of 'mylist' and get the sum of the 'Income' using Reduce with +.

data.frame(mylist[[1]][1], Reduce(`+`, lapply(mylist, `[`, 'Income')))

Upvotes: 4

Related Questions