Reputation: 97
I have a dataframe in R that looks like this:
df <-
data.frame(
"first_col" = c("apple", "apple", "banana", "banana", "cacao", "dough"),
"second_col" = c("apple", "apple", "banana", "banana", "apple", "dough"),
"third_col" = c("banana", "apple", "banana", "banana", "banana", "apple"),
stringsAsFactors = FALSE
)
and I want to generate a new column that is made sorting the content of the three previous columns using base R.
If I wanted it unsorted I could have done this
df$label <- paste(df$first_col,
df$second_col,
df$third_col,
sep = " - ")
If I try to sort the items with sort like this:
df$label <- paste(sort(df$first_col,
df$second_col,
df$third_col),
sep = " - ")
I get this error:
Error in sort(df$first_col, df$second_col, df$third_col) :
'decreasing' must be a length-1 logical vector.
Did you intend to set 'partial'?
So obviously I'm doing something wrong. Looking at the docs it seems like the method wants a vector, so I try to vectorize it doing this
df$label <- paste(sort(c(df$first_col,
df$second_col,
df$third_col)),
sep = " - ")
but I get another error:
Error in `$<-.data.frame`(`*tmp*`, label, value = c("apple", :
replacement has 18 rows, data has 6
It looks like it's generating three columns and not just one. What am I doing wrong?
From a dataframe that looks like this:
first_col second_col third_col
1 apple apple banana
2 apple apple apple
3 banana banana banana
4 banana banana banana
5 cacao apple banana
6 dough dough apple
I'd like to obtain something that looks like this:
first_col second_col third_col label
1 apple apple banana apple - apple - banana
2 apple apple apple apple - apple - apple
3 banana banana banana banana - banana - banana
4 banana banana banana banana - banana - banana
5 cacao apple banana apple - banana - cacao
6 dough dough apple apple - dough - dough
You can tell is sorted looking at rows 5 and 6.
Upvotes: 3
Views: 73
Reputation: 13309
With base
:
df$combined<-apply(df,1,function(x) paste(sort(x),collapse="-"))
df
first_col second_col third_col combined
1 apple apple banana apple-apple-banana
2 apple apple apple apple-apple-apple
3 banana banana banana banana-banana-banana
4 banana banana banana banana-banana-banana
5 cacao apple banana apple-banana-cacao
6 dough dough apple apple-dough-dough
To use only columns 1 and 2:
df$combined<-apply(df[1:2],1,function(x) paste(sort(x),collapse=" - "))
df
first_col second_col third_col combined
1 apple apple banana apple - apple
2 apple apple apple apple - apple
3 banana banana banana banana - banana
4 banana banana banana banana - banana
5 cacao apple banana apple - cacao
6 dough dough apple dough - dough
Data
df <- structure(list(first_col = c("apple", "apple", "banana", "banana",
"cacao", "dough"), second_col = c("apple", "apple", "banana",
"banana", "apple", "dough"), third_col = c("banana", "apple",
"banana", "banana", "banana", "apple"), sorted = c("apple-apple-banana",
"apple-apple-apple", "banana-banana-banana", "banana-banana-banana",
"apple-banana-cacao", "apple-dough-dough")), row.names = c(NA,
-6L), class = "data.frame")
Upvotes: 2
Reputation: 1513
Another way using using dplyr
mutate()
and purrr
pmap()
library(dplyr)
library(purrr)
df <-
data.frame(
"first_col" = c("apple", "apple", "banana", "banana", "cacao", "dough"),
"second_col" = c("apple", "apple", "banana", "banana", "apple", "dough"),
"third_col" = c("banana", "apple", "banana", "banana", "banana", "apple"),
stringsAsFactors = FALSE
)
df %>%
mutate(label = pmap(list(first_col, second_col, third_col), function(x, y, z) paste(sort(c(x,y,z)), collapse = " - ")))
# first_col second_col third_col label
# 1 apple apple banana apple - apple - banana
# 2 apple apple apple apple - apple - apple
# 3 banana banana banana banana - banana - banana
# 4 banana banana banana banana - banana - banana
# 5 cacao apple banana apple - banana - cacao
# 6 dough dough apple apple - dough - dough
Upvotes: 2