Reputation: 91
I am trying to extract unique values within each rows of dataframe in R without using for loop.
df <- data.frame(customer = c('joe','jane','john','mary'), fruit = c('orange, apple, orange', NA, 'apple', 'orange, orange'))
df
customer fruit
1 joe orange, apple, orange
2 jane <NA>
3 john apple
4 mary orange, orange
What I want for the fruit
column is:
'orange, apple', NA, 'apple', 'orange'
customer fruit
1 joe orange, apple
2 jane <NA>
3 john apple
4 mary orange
I tried something along the lines of
apply(df, 1, function(x) unique(unlist(str_split(x[, "fruit"], ", "))))
and it is not working.
How can I get unique values within each row in the dataframe?
Upvotes: 5
Views: 1478
Reputation: 21908
Updated Solution I just modified my code to match what you would like your output to be.
library(dplyr)
library(tidyr)
df %>%
separate_rows(fruit) %>%
distinct(customer, fruit) %>%
group_by(customer) %>%
summarise(fruit = paste(sort(fruit, na.last = FALSE), collapse = ", "))
# A tibble: 4 x 2
customer fruit
<chr> <chr>
1 jane NA
2 joe apple, orange
3 john apple
4 mary orange
Upvotes: 0
Reputation: 26218
A simple pipe syntax using dplyr
and purrr::map
df %>% mutate(fruit = str_split(fruit, ", "),
fruit = map(fruit, ~ unique(.x)))
customer fruit
1 joe orange, apple
2 jane NA
3 john apple
4 mary orange
or BaseR only
df$fruit <- Map(unique, strsplit(df$fruit, ", "))
df
> df
customer fruit
1 joe orange, apple
2 jane NA
3 john apple
4 mary orange
Note: Assumption that every string is separated by a comma and a space as shown in sample
Upvotes: 1
Reputation: 388982
Base R option :
Split the string on comma, keep unique values and paste the values into comma-separated string.
df$fruit <- sapply(strsplit(df$fruit, ',\\s+'), function(x) toString(unique(x)))
df
# customer fruit
#1 joe orange, apple
#2 jane NA
#3 john apple
#4 mary orange
Upvotes: 4
Reputation: 199
here is a potential solution using base R, no libraries. Lots of ugly brackets but I think it works..
df$fruit <-lapply(1:nrow(df),function(n)unique(trimws(unlist(strsplit(df$fruit[n],",")))))
output as follows
> df
customer fruit
1 joe orange, apple
2 jane NA
3 john apple
4 mary orange
Upvotes: 0