Reputation: 851
I have multiple dataframes with identical column names and dimension. :
df1
device_id price tax
1 a 200 5
2 b 100 2
3 c 50 1
df2
device_id price tax
1 b 200 7
2 a 100 3
3 c 50 1
df3
device_id price tax
1 c 50 5
2 b 300 1
3 a 50 2
What I want to do is to create another dataframe df
where I will add the price and tax values from the above three dataframes with matching device_id
s.
So,
df
would be like
df
device_id price tax
1 a 350 10
2 b 600 10
3 c 150 7
How can I do it? Also, it would be great if the solution can be generalized to larger number of dataframes instead of just 3.
Upvotes: 0
Views: 523
Reputation: 887991
We can use by
from base R
after rbind
ing after we place all the data.frame objects in a list
(mget(paste0("df", 1:3))
)
dfN <- do.call(rbind, mget(paste0("df", 1:3)))
do.call(rbind, by(dfN[-1], dfN[1], FUN = colSums))
Upvotes: 1
Reputation: 99391
First, get all your data frames into a list (called dflist
here, defined below). Then it's easy to do with aggregate()
after row-binding the list elements.
aggregate(. ~ device_id, do.call(rbind, dflist), sum)
# device_id price tax
# 1 a 350 10
# 2 b 600 10
# 3 c 150 7
Or you could use the data.table package.
library(data.table)
rbindlist(dflist)[, lapply(.SD, sum), by = device_id]
# device_id price tax
# 1: a 350 10
# 2: b 600 10
# 3: c 150 7
Or dplyr.
library(dplyr)
bind_rows(dflist) %>%
group_by(device_id) %>%
summarize_each(funs(sum))
# Source: local data frame [3 x 3]
#
# device_id price tax
# <fctr> <int> <int>
# 1 a 350 10
# 2 b 600 10
# 3 c 150 7
Data:
dflist <- structure(list(df1 = structure(list(device_id = structure(1:3, .Label = c("a",
"b", "c"), class = "factor"), price = c(200L, 100L, 50L), tax = c(5L,
2L, 1L)), .Names = c("device_id", "price", "tax"), class = "data.frame", row.names = c("1",
"2", "3")), df2 = structure(list(device_id = structure(c(2L,
1L, 3L), .Label = c("a", "b", "c"), class = "factor"), price = c(200L,
100L, 50L), tax = c(7L, 3L, 1L)), .Names = c("device_id", "price",
"tax"), class = "data.frame", row.names = c("1", "2", "3")),
df3 = structure(list(device_id = structure(c(3L, 2L, 1L), .Label = c("a",
"b", "c"), class = "factor"), price = c(50L, 300L, 50L),
tax = c(5L, 1L, 2L)), .Names = c("device_id", "price",
"tax"), class = "data.frame", row.names = c("1", "2", "3"
))), .Names = c("df1", "df2", "df3"))
Upvotes: 1