How to generate a new column in an R dataframe with ordered items from multiple columns

I have a dataframe in R that looks like this:

df <-
data.frame(
"first_col" = c("apple", "apple", "banana", "banana", "cacao", "dough"),
"second_col" = c("apple", "apple", "banana", "banana", "apple", "dough"),
"third_col" = c("banana", "apple", "banana", "banana", "banana", "apple"),
stringsAsFactors = FALSE
)

and I want to generate a new column that is made sorting the content of the three previous columns using base R.

If I wanted it unsorted I could have done this

df$label <- paste(df$first_col,
                  df$second_col,
                  df$third_col,
                  sep = " - ")

If I try to sort the items with sort like this:

df$label <- paste(sort(df$first_col,
                                     df$second_col,
                                     df$third_col),
                              sep = " - ")

I get this error:

Error in sort(df$first_col, df$second_col, df$third_col) : 
  'decreasing' must be a length-1 logical vector.
Did you intend to set 'partial'?

So obviously I'm doing something wrong. Looking at the docs it seems like the method wants a vector, so I try to vectorize it doing this

df$label <- paste(sort(c(df$first_col,
                                   df$second_col,
                                   df$third_col)),
                              sep = " - ")

but I get another error:

Error in `$<-.data.frame`(`*tmp*`, label, value = c("apple",  : 
  replacement has 18 rows, data has 6

It looks like it's generating three columns and not just one. What am I doing wrong?

From a dataframe that looks like this:

  first_col second_col third_col
1     apple      apple    banana
2     apple      apple     apple
3    banana     banana    banana
4    banana     banana    banana
5     cacao      apple    banana
6     dough      dough     apple

I'd like to obtain something that looks like this:

  first_col second_col third_col                       label
1     apple      apple    banana      apple - apple - banana
2     apple      apple     apple       apple - apple - apple
3    banana     banana    banana    banana - banana - banana
4    banana     banana    banana    banana - banana - banana
5     cacao      apple    banana      apple - banana - cacao
6     dough      dough     apple       apple - dough - dough

You can tell is sorted looking at rows 5 and 6.

Upvotes: 3

Views: 73

Answers (2)

NelsonGon
NelsonGon

Reputation: 13309

With base:

df$combined<-apply(df,1,function(x) paste(sort(x),collapse="-"))
 df
  first_col second_col third_col               combined
1     apple      apple    banana   apple-apple-banana
2     apple      apple     apple    apple-apple-apple
3    banana     banana    banana banana-banana-banana
4    banana     banana    banana banana-banana-banana
5     cacao      apple    banana   apple-banana-cacao
6     dough      dough     apple    apple-dough-dough

To use only columns 1 and 2:

df$combined<-apply(df[1:2],1,function(x) paste(sort(x),collapse=" - "))
 df
  first_col second_col third_col        combined
1     apple      apple    banana   apple - apple
2     apple      apple     apple   apple - apple
3    banana     banana    banana banana - banana
4    banana     banana    banana banana - banana
5     cacao      apple    banana   apple - cacao
6     dough      dough     apple   dough - dough

Data

df <- structure(list(first_col = c("apple", "apple", "banana", "banana", 
"cacao", "dough"), second_col = c("apple", "apple", "banana", 
"banana", "apple", "dough"), third_col = c("banana", "apple", 
"banana", "banana", "banana", "apple"), sorted = c("apple-apple-banana", 
"apple-apple-apple", "banana-banana-banana", "banana-banana-banana", 
"apple-banana-cacao", "apple-dough-dough")), row.names = c(NA, 
-6L), class = "data.frame")

Upvotes: 2

EJJ
EJJ

Reputation: 1513

Another way using using dplyr mutate() and purrr pmap()

library(dplyr)
library(purrr)

df <-
  data.frame(
    "first_col" = c("apple", "apple", "banana", "banana", "cacao", "dough"),
    "second_col" = c("apple", "apple", "banana", "banana", "apple", "dough"),
    "third_col" = c("banana", "apple", "banana", "banana", "banana", "apple"),
    stringsAsFactors = FALSE
  )

df %>% 
  mutate(label = pmap(list(first_col, second_col, third_col), function(x, y, z) paste(sort(c(x,y,z)), collapse = " - ")))

# first_col second_col third_col                    label
# 1     apple      apple    banana   apple - apple - banana
# 2     apple      apple     apple    apple - apple - apple
# 3    banana     banana    banana banana - banana - banana
# 4    banana     banana    banana banana - banana - banana
# 5     cacao      apple    banana   apple - banana - cacao
# 6     dough      dough     apple    apple - dough - dough

Upvotes: 2

Related Questions