Reputation: 486
I want to generate a single dataframe from the list of dataframes in which all columns are same except one column 'Income' which should be sum of the all the 'Income' in the list.
Here is my list of dataframe
mylist= structure(list(`1` = structure(list(ID = c(36L, 37L, 38L, 39L), Income = c(0, 0, 0, 9100)), .Names = c("ID", "Income"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame"), `2` = structure(list(ID = c(36L, 37L, 38L, 39L), Income = c(0, 0, 0, 0)), .Names = c("ID", "Income"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame"), `3` = structure(list(ID = c(36L, 37L, 38L, 39L), Income = c(7360, 0, 0, 0)), .Names = c("ID", "Income"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame")))
> mylist
$`1`
ID Income
1 36 0
2 37 0
3 38 0
4 39 9100
$`2`
ID Income
1 36 0
2 37 0
3 38 0
4 39 0
$`3`
ID Income
1 36 7360
2 37 0
3 38 0
4 39 0
This is what I want to do :
ID Income
34 36 7360
26 37 0
23 38 0
15 39 9100
I have tried to use reduce() to do the sum but it creates a separate column which I want to avoid :
Reduce(function(df1, df2) data.frame(df1[,], res=df1["Income"] + df2["Income"]),mylist)
Upvotes: 4
Views: 1112
Reputation: 763
Why don you do rbind in your list of dataframes and then aggregate?
system.time(teste <- do.call("rbind",mylist))
# user system elapsed
# 0.004 0.000 0.000
system.time(r1 <- Reduce(function(...) merge(..., by = "ID"), mylist))
# user system elapsed
# 0.004 0.000 0.002
aggregate(teste$Income, by=list(teste$ID), sum)
# Group.1 x
#1 36 7360
#2 37 0
#3 38 0
#4 39 9100
Upvotes: 0
Reputation: 887971
If the 'ID's in different list
can be different, we merge
the datasets using Reduce
, and then do the rowSums
of the output except the first one to create the 'Income' column.
r1 <- Reduce(function(...) merge(..., by = "ID"), mylist)
data.frame(r1[1], Income = rowSums(r1[-1]))
# ID Income
#1 36 7360
#2 37 0
#3 38 0
#4 39 9100
If the 'IDs' are the same and in the same order for all the datasets in the list
, we create the data.frame
by extracting the 'ID' from the first element of 'mylist' and get the sum of the 'Income' using Reduce
with +
.
data.frame(mylist[[1]][1], Reduce(`+`, lapply(mylist, `[`, 'Income')))
Upvotes: 4