Reputation: 805

rbinding two data frame with same num of columns

I have two data frame:

 S     C     V1
"d"   "q"    2
...

 C     R     V2
"u"   "t"    5
...

I want to achieve this:

 B     T       V
"d"   "q"      2
...
"u"  "t"       5
...

How can I do that efficiently?

Upvotes: 1

Answers (4)

Jan

Reputation: 5254

The explicit request for efficiency and jwarz comment on it made me briefly compare the 3 proposed solutions

Summary

the bind_rows()approach is the fastest solution (as stated by jwarz) with the disadvantage of depending on an extra package.
the Map() approach has the advantage of being flexible but is rather slow (over 35% slower than dplyr).
It seems, the speed advantage of bind_rows becomes larger with larger data frames.

Code and Results

library(dplyr)
library(microbenchmark, quietly = TRUE)

df1 <- structure(list(S = "d", C = "q", V1 = 2L), 
                 class = "data.frame", row.names = c(NA, -1L))
df2 <- structure(list(C = "u", R = "t", V2 = 5L), 
                 class = "data.frame", row.names = c(NA, -1L))
new_cols <- c("B", "T", "V")

solution1 <- function(df1, df2, new_cols) {
  names(df1) <- new_cols
  names(df2) <- new_cols
  return( rbind(df1, df2) )
}

solution2 <- function(df1, df2, new_cols) {
  dftmp <- Map(`names<-`, list(df1, df2), list(value = new_cols))
  return( do.call(rbind, dftmp) )
}

solution3 <- function(df1, df2, new_cols) {
  colnames(df2) <- colnames(df1) <- new_cols
  return( bind_rows(df1, df2) )
}

microbenchmark(
  rbind = solution1(df1, df2, new_cols), 
  Map   = solution2(df1, df2, new_cols),
  dplyr = solution3(df1, df2, new_cols),
  times = 1E4L
)
#> Unit: microseconds
#>   expr  min   lq     mean median   uq     max neval
#>  rbind 70.8 78.4 87.57165   82.0 88.0  2613.2 10000
#>    Map 81.3 88.9 99.86045   93.0 99.4 10521.7 10000
#>  dplyr 53.3 62.8 70.44836   68.3 71.0  2362.6 10000

df1 <- structure(list(S = letters[sample(1:26, 999L, replace = TRUE)], 
                      C = letters[sample(1:26, 999L, replace = TRUE)], 
                      V1 = sample(1:26, 999L, replace = TRUE)), 
                 class = "data.frame", row.names = 1:999)
df2 <- structure(list(C = letters[sample(1:26, 999L, replace = TRUE)], 
                      R = letters[sample(1:26, 999L, replace = TRUE)], 
                      V2 = sample(1:26, 999L, replace = TRUE)), 
                 class = "data.frame", row.names = 1:999)

  microbenchmark(
  rbind = solution1(df1, df2, new_cols),
  Map   = solution2(df1, df2, new_cols),
  dplyr = solution3(df1, df2, new_cols),
  times = 1E4L
)
#> Unit: microseconds
#>   expr   min    lq      mean median     uq    max neval
#>  rbind 119.5 130.1 140.10275  134.2 141.10 2751.9 10000
#>    Map 130.4 141.4 153.46169  145.8 152.65 3978.6 10000
#>  dplyr  58.3  70.8  78.97621   77.8  80.55 2289.1 10000

^{Created on 2021-01-03 by the reprex package (v0.3.0)}

Upvotes: 2

jwarz

Reputation: 531

You can't bind_rows ignoring column names. But you can create a function to circumvent this:

library(dplyr)
force_bind <- function(df1, df2, x_names) {
  
    colnames(df2) <- colnames(df1) <- x_names
    bind_rows(df1, df2)
}

force_bind(df1, df2, c("B", "T", "V"))
##   B T V
## 1 d q 2
## 2 u t 5
``

Upvotes: 2

Rui Barradas

Reputation: 76450

Here is a base R solution that doesn't depend on the number of data.frames you want to rbind together.

dftmp <- Map(`names<-`, list(df1, df2), list(value = c("B", "T", "V")))
df_final <- do.call(rbind, dftmp)

df_final
#  B T V
#1 d q 2
#2 u t 5

Data

df1 <- read.table(text = "
S     C     V1
d   q    2
", header = TRUE)

df2 <- read.table(text = "
C     R     V2
u   t    5
", header = TRUE)

Upvotes: 2

Ronak Shah

Reputation: 388982

You can rename the two dataframes with the same name to combine them together.

new_cols <- c('B','T', 'V')
names(df1) <- new_cols
names(df2) <- new_cols

result <- rbind(df1, df2)

Upvotes: 4

rbinding two data frame with same num of columns

Answers (4)

Related Questions