Reputation: 805
I have two data frame:
1)
S C V1
"d" "q" 2
...
C R V2
"u" "t" 5
...
I want to achieve this:
B T V
"d" "q" 2
...
"u" "t" 5
...
How can I do that efficiently?
Upvotes: 1
Views: 79
Reputation: 5254
The explicit request for efficiency and jwarz comment on it made me briefly compare the 3 proposed solutions
Summary
bind_rows()
approach is the fastest solution (as stated by jwarz) with the disadvantage of depending on an extra package.Map()
approach has the advantage of being flexible but is rather slow (over 35% slower than dplyr).Code and Results
library(dplyr)
library(microbenchmark, quietly = TRUE)
df1 <- structure(list(S = "d", C = "q", V1 = 2L),
class = "data.frame", row.names = c(NA, -1L))
df2 <- structure(list(C = "u", R = "t", V2 = 5L),
class = "data.frame", row.names = c(NA, -1L))
new_cols <- c("B", "T", "V")
solution1 <- function(df1, df2, new_cols) {
names(df1) <- new_cols
names(df2) <- new_cols
return( rbind(df1, df2) )
}
solution2 <- function(df1, df2, new_cols) {
dftmp <- Map(`names<-`, list(df1, df2), list(value = new_cols))
return( do.call(rbind, dftmp) )
}
solution3 <- function(df1, df2, new_cols) {
colnames(df2) <- colnames(df1) <- new_cols
return( bind_rows(df1, df2) )
}
microbenchmark(
rbind = solution1(df1, df2, new_cols),
Map = solution2(df1, df2, new_cols),
dplyr = solution3(df1, df2, new_cols),
times = 1E4L
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> rbind 70.8 78.4 87.57165 82.0 88.0 2613.2 10000
#> Map 81.3 88.9 99.86045 93.0 99.4 10521.7 10000
#> dplyr 53.3 62.8 70.44836 68.3 71.0 2362.6 10000
df1 <- structure(list(S = letters[sample(1:26, 999L, replace = TRUE)],
C = letters[sample(1:26, 999L, replace = TRUE)],
V1 = sample(1:26, 999L, replace = TRUE)),
class = "data.frame", row.names = 1:999)
df2 <- structure(list(C = letters[sample(1:26, 999L, replace = TRUE)],
R = letters[sample(1:26, 999L, replace = TRUE)],
V2 = sample(1:26, 999L, replace = TRUE)),
class = "data.frame", row.names = 1:999)
microbenchmark(
rbind = solution1(df1, df2, new_cols),
Map = solution2(df1, df2, new_cols),
dplyr = solution3(df1, df2, new_cols),
times = 1E4L
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> rbind 119.5 130.1 140.10275 134.2 141.10 2751.9 10000
#> Map 130.4 141.4 153.46169 145.8 152.65 3978.6 10000
#> dplyr 58.3 70.8 78.97621 77.8 80.55 2289.1 10000
Created on 2021-01-03 by the reprex package (v0.3.0)
Upvotes: 2
Reputation: 531
You can't bind_rows ignoring column names. But you can create a function to circumvent this:
library(dplyr)
force_bind <- function(df1, df2, x_names) {
colnames(df2) <- colnames(df1) <- x_names
bind_rows(df1, df2)
}
force_bind(df1, df2, c("B", "T", "V"))
## B T V
## 1 d q 2
## 2 u t 5
``
Upvotes: 2
Reputation: 76450
Here is a base R solution that doesn't depend on the number of data.frames you want to rbind
together.
dftmp <- Map(`names<-`, list(df1, df2), list(value = c("B", "T", "V")))
df_final <- do.call(rbind, dftmp)
df_final
# B T V
#1 d q 2
#2 u t 5
Data
df1 <- read.table(text = "
S C V1
d q 2
", header = TRUE)
df2 <- read.table(text = "
C R V2
u t 5
", header = TRUE)
Upvotes: 2
Reputation: 388982
You can rename the two dataframes with the same name to combine them together.
new_cols <- c('B','T', 'V')
names(df1) <- new_cols
names(df2) <- new_cols
result <- rbind(df1, df2)
Upvotes: 4