ybcha204
ybcha204

Reputation: 91

How to extract unique values within each row in dataframe?

I am trying to extract unique values within each rows of dataframe in R without using for loop.

df <- data.frame(customer = c('joe','jane','john','mary'), fruit = c('orange, apple, orange', NA, 'apple', 'orange, orange'))

df

  customer                 fruit
1      joe orange, apple, orange
2     jane                  <NA>
3     john                 apple
4     mary        orange, orange

What I want for the fruit column is: 'orange, apple', NA, 'apple', 'orange'

  customer                 fruit
1      joe         orange, apple
2     jane                  <NA>
3     john                 apple
4     mary                orange

I tried something along the lines of

apply(df, 1, function(x) unique(unlist(str_split(x[, "fruit"], ", "))))

and it is not working.

How can I get unique values within each row in the dataframe?

Upvotes: 5

Views: 1478

Answers (4)

Anoushiravan R
Anoushiravan R

Reputation: 21908

Updated Solution I just modified my code to match what you would like your output to be.

library(dplyr)
library(tidyr)

df %>%
  separate_rows(fruit) %>%
  distinct(customer, fruit) %>%
  group_by(customer) %>%
  summarise(fruit = paste(sort(fruit, na.last = FALSE), collapse = ", "))

# A tibble: 4 x 2
  customer fruit        
  <chr>    <chr>        
1 jane     NA           
2 joe      apple, orange
3 john     apple        
4 mary     orange

Upvotes: 0

AnilGoyal
AnilGoyal

Reputation: 26218

A simple pipe syntax using dplyr and purrr::map

df %>% mutate(fruit = str_split(fruit, ", "),
              fruit = map(fruit, ~ unique(.x)))
  customer         fruit
1      joe orange, apple
2     jane            NA
3     john         apple
4     mary        orange

or BaseR only

df$fruit <- Map(unique, strsplit(df$fruit, ", "))
df

> df
  customer         fruit
1      joe orange, apple
2     jane            NA
3     john         apple
4     mary        orange

Note: Assumption that every string is separated by a comma and a space as shown in sample

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

Base R option :

Split the string on comma, keep unique values and paste the values into comma-separated string.

df$fruit <- sapply(strsplit(df$fruit, ',\\s+'), function(x) toString(unique(x)))
df

#  customer         fruit
#1      joe orange, apple
#2     jane            NA
#3     john         apple
#4     mary        orange

Upvotes: 4

Liam McGrenaghan
Liam McGrenaghan

Reputation: 199

here is a potential solution using base R, no libraries. Lots of ugly brackets but I think it works..

df$fruit <-lapply(1:nrow(df),function(n)unique(trimws(unlist(strsplit(df$fruit[n],",")))))

output as follows

> df
  customer         fruit
1      joe orange, apple
2     jane            NA
3     john         apple
4     mary        orange

Upvotes: 0

Related Questions