v_head
v_head

Reputation: 805

rbinding two data frame with same num of columns

I have two data frame:

1)

 S     C     V1
"d"   "q"    2
...
 C     R     V2
"u"   "t"    5
...

I want to achieve this:

 B     T       V
"d"   "q"      2
...
"u"  "t"       5
...

How can I do that efficiently?

Upvotes: 1

Views: 79

Answers (4)

Jan
Jan

Reputation: 5254

The explicit request for efficiency and jwarz comment on it made me briefly compare the 3 proposed solutions

Summary

  • the bind_rows()approach is the fastest solution (as stated by jwarz) with the disadvantage of depending on an extra package.
  • the Map() approach has the advantage of being flexible but is rather slow (over 35% slower than dplyr).
  • It seems, the speed advantage of bind_rows becomes larger with larger data frames.

Code and Results

library(dplyr)
library(microbenchmark, quietly = TRUE)

df1 <- structure(list(S = "d", C = "q", V1 = 2L), 
                 class = "data.frame", row.names = c(NA, -1L))
df2 <- structure(list(C = "u", R = "t", V2 = 5L), 
                 class = "data.frame", row.names = c(NA, -1L))
new_cols <- c("B", "T", "V")

solution1 <- function(df1, df2, new_cols) {
  names(df1) <- new_cols
  names(df2) <- new_cols
  return( rbind(df1, df2) )
}

solution2 <- function(df1, df2, new_cols) {
  dftmp <- Map(`names<-`, list(df1, df2), list(value = new_cols))
  return( do.call(rbind, dftmp) )
}

solution3 <- function(df1, df2, new_cols) {
  colnames(df2) <- colnames(df1) <- new_cols
  return( bind_rows(df1, df2) )
}

microbenchmark(
  rbind = solution1(df1, df2, new_cols), 
  Map   = solution2(df1, df2, new_cols),
  dplyr = solution3(df1, df2, new_cols),
  times = 1E4L
)
#> Unit: microseconds
#>   expr  min   lq     mean median   uq     max neval
#>  rbind 70.8 78.4 87.57165   82.0 88.0  2613.2 10000
#>    Map 81.3 88.9 99.86045   93.0 99.4 10521.7 10000
#>  dplyr 53.3 62.8 70.44836   68.3 71.0  2362.6 10000

df1 <- structure(list(S = letters[sample(1:26, 999L, replace = TRUE)], 
                      C = letters[sample(1:26, 999L, replace = TRUE)], 
                      V1 = sample(1:26, 999L, replace = TRUE)), 
                 class = "data.frame", row.names = 1:999)
df2 <- structure(list(C = letters[sample(1:26, 999L, replace = TRUE)], 
                      R = letters[sample(1:26, 999L, replace = TRUE)], 
                      V2 = sample(1:26, 999L, replace = TRUE)), 
                 class = "data.frame", row.names = 1:999)

  microbenchmark(
  rbind = solution1(df1, df2, new_cols),
  Map   = solution2(df1, df2, new_cols),
  dplyr = solution3(df1, df2, new_cols),
  times = 1E4L
)
#> Unit: microseconds
#>   expr   min    lq      mean median     uq    max neval
#>  rbind 119.5 130.1 140.10275  134.2 141.10 2751.9 10000
#>    Map 130.4 141.4 153.46169  145.8 152.65 3978.6 10000
#>  dplyr  58.3  70.8  78.97621   77.8  80.55 2289.1 10000

Created on 2021-01-03 by the reprex package (v0.3.0)

Upvotes: 2

jwarz
jwarz

Reputation: 531

You can't bind_rows ignoring column names. But you can create a function to circumvent this:

library(dplyr)
force_bind <- function(df1, df2, x_names) {
  
    colnames(df2) <- colnames(df1) <- x_names
    bind_rows(df1, df2)
}

force_bind(df1, df2, c("B", "T", "V"))
##   B T V
## 1 d q 2
## 2 u t 5
``

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76450

Here is a base R solution that doesn't depend on the number of data.frames you want to rbind together.

dftmp <- Map(`names<-`, list(df1, df2), list(value = c("B", "T", "V")))
df_final <- do.call(rbind, dftmp)

df_final
#  B T V
#1 d q 2
#2 u t 5

Data

df1 <- read.table(text = "
S     C     V1
d   q    2
", header = TRUE)

df2 <- read.table(text = "
C     R     V2
u   t    5
", header = TRUE)

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

You can rename the two dataframes with the same name to combine them together.

new_cols <- c('B','T', 'V')
names(df1) <- new_cols
names(df2) <- new_cols

result <- rbind(df1, df2)

Upvotes: 4

Related Questions