Madhumita
Madhumita

Reputation: 499

How to remove duplicate comma separated character values from each cell of a column using R

I have a data-frame with 2 columns ID and Product as below :

ID  Product
A   Clothing, Clothing Food, Furniture, Furniture
B   Food,Food,Food, Clothing
C   Food, Clothing, Clothing

I need to have only unique products for each ID, for example :

ID  Product
A   Clothing, Food, Furniture
B   Food, Clothing
C   Food, Clothing

How do I do this using R

Upvotes: 3

Views: 1268

Answers (1)

akrun
akrun

Reputation: 886948

If there are multiple delimiters in the dataset, one way would be to split the 'Product' column using all the delimiters, get the unique and then paste it together (toString) grouped by 'ID'. Here we use data.table methods.

library(data.table)
setDT(df1)[, list(Product= toString(unique(strsplit(Product, 
            ',\\s*|\\s+')[[1]]))), by = ID]
#   ID                   Product
#1:  A Clothing, Food, Furniture
#2:  B            Food, Clothing
#3:  C            Food, Clothing

Upvotes: 4

Related Questions