Reputation: 1829

Combine two or more columns in a dataframe into a new column with a new name

For example if I have this:

n = c(2, 3, 5) 
s = c("aa", "bb", "cc") 
b = c(TRUE, FALSE, TRUE) 
df = data.frame(n, s, b)

  n  s     b
1 2 aa  TRUE
2 3 bb FALSE
3 5 cc  TRUE

Then how do I combine the two columns n and s into a new column named x such that it looks like this:

  n  s     b     x
1 2 aa  TRUE  2 aa
2 3 bb FALSE  3 bb
3 5 cc  TRUE  5 cc

Upvotes: 158

Answers (9)

Iyar Lin

Reputation: 631

I'd like to also propose a method for concatenating a large/unknown number of columns. The solution proposed by Ben Ernest can be pretty slow on large datasets.

Below is my proposed solution:

# setup data.frame - Making it large for the time benchmarking
n = rep(c(2, 3, 5), 1000000)
s = rep(c("aa", "bb", "cc"), 1000000)
b = rep(c(TRUE, FALSE, TRUE), 1000000) 
df = data.frame(n, s, b)

# The proposed solution:
colNames = c("n", "s") # could be any number of column names here
df$x <- do.call(paste0, c(df[,colNames], sep=" "))

# running system.time on this yields:
# user  system elapsed 
# 1.861   0.005   1.865 

# compare with alternative method:
df$x <- apply(df[, colNames, drop = F], MARGIN = 1, 
                         FUN = function(i) paste(i, collapse = ""))
# running system.time on this yields:
# user  system elapsed 
#  16.127   0.147  16.304

Upvotes: 1

Ben Ernest

Reputation: 498

There are other great answers, but in the case where you don't know the column names or the number of columns you want to concatenate beforehand, the following is useful.

df = data.frame(x = letters[1:5], y = letters[6:10], z = letters[11:15])
colNames = colnames(df) # could be any number of column names here
df$newColumn = apply(df[, colNames, drop = F], MARGIN = 1, FUN = function(i) paste(i, collapse = ""))

Upvotes: 6

avallecam

Reputation: 679

Instead of

paste (default spaces),
paste0 (force the inclusion of missing NA as character) or
unite (constrained to 2 columns and 1 separator),

I'd suggest an alternative as flexible as paste0 but more careful with NA: stringr::str_c

library(tidyverse)

# check the missing value!!
df <- tibble(
  n = c(2, 2, 8),
  s = c("aa", "aa", NA_character_),
  b = c(TRUE, FALSE, TRUE)
)

df %>% 
  mutate(
    paste = paste(n,"-",s,".",b),
    paste0 = paste0(n,"-",s,".",b),
    str_c = str_c(n,"-",s,".",b)
  ) %>% 

  # convert missing value to ""
  mutate(
    s_2=str_replace_na(s,replacement = "")
  ) %>% 
  mutate(
    str_c_2 = str_c(n,"-",s_2,".",b)
  )
#> # A tibble: 3 x 8
#>       n s     b     paste          paste0     str_c      s_2   str_c_2   
#>   <dbl> <chr> <lgl> <chr>          <chr>      <chr>      <chr> <chr>     
#> 1     2 aa    TRUE  2 - aa . TRUE  2-aa.TRUE  2-aa.TRUE  "aa"  2-aa.TRUE 
#> 2     2 aa    FALSE 2 - aa . FALSE 2-aa.FALSE 2-aa.FALSE "aa"  2-aa.FALSE
#> 3     8 <NA>  TRUE  8 - NA . TRUE  8-NA.TRUE  <NA>       ""    8-.TRUE

^{Created on 2020-04-10 by the reprex package (v0.3.0)}

extra note from str_c documentation

Like most other R functions, missing values are "infectious": whenever a missing value is combined with another string the result will always be missing. Use str_replace_na() to convert NA to "NA"

Upvotes: 8

yanes

Reputation: 440

We can use paste0:

df$combField <- paste0(df$x, df$y)

If you do not want any padding space introduced in the concatenated field. This is more useful if you are planning to use the combined field as a unique id that represents combinations of two fields.

Upvotes: 14

Little Bee

Reputation: 1225

For inserting a separator:

df$x <- paste(df$n, "-", df$s)

Upvotes: 57

Ferroao

Reputation: 3066

Some examples with NAs and their removal using apply

n = c(2, NA, NA) 
s = c("aa", "bb", NA) 
b = c(TRUE, FALSE, NA) 
c = c(2, 3, 5) 
d = c("aa", NA, "cc") 
e = c(TRUE, NA, TRUE) 
df = data.frame(n, s, b, c, d, e)

paste_noNA <- function(x,sep=", ") {
gsub(", " ,sep, toString(x[!is.na(x) & x!="" & x!="NA"] ) ) }

sep=" "
df$x <- apply( df[ , c(1:6) ] , 1 , paste_noNA , sep=sep)
df

Upvotes: 16

Quentin Perrier

Reputation: 546

As already mentioned in comments by Uwe and UseR, a general solution in the tidyverse format would be to use the command unite:

library(tidyverse)

n = c(2, 3, 5) 
s = c("aa", "bb", "cc") 
b = c(TRUE, FALSE, TRUE) 

df = data.frame(n, s, b) %>% 
  unite(x, c(n, s), sep = " ", remove = FALSE)

Upvotes: 34

sbha

Reputation: 10432

Using dplyr::mutate:

library(dplyr)
df <- mutate(df, x = paste(n, s)) 

df 
> df
  n  s     b    x
1 2 aa  TRUE 2 aa
2 3 bb FALSE 3 bb
3 5 cc  TRUE 5 cc

Upvotes: 22

mnel

Reputation: 115505

Use paste.

 df$x <- paste(df$n,df$s)
 df
#   n  s     b    x
# 1 2 aa  TRUE 2 aa
# 2 3 bb FALSE 3 bb
# 3 5 cc  TRUE 5 cc

Upvotes: 179

Combine two or more columns in a dataframe into a new column with a new name

Answers (9)

Related Questions