How to remove and get the number of unique strings separated by comma in a column in R

Question

I have this dataframe mydf. I want to remove the duplicate items across column customer_sample_id that are separated by comma and get the unique counts(new.freq) as shown in the result.

mydf<- structure(list(count = c(6, 3, 3), customer_sample_id = c("AMLM12001KP ( chr2 : chr9 ),1028701 ( chr2 : chr9 ),1028701 ( chr2 : chr9 ),1220901 ( chr2 : chr9 ),AMLM12015WPS ( chr2 : chr9 ),AML203 ( chr2 : chr9 )", 
"AMLM12001KP ( chr2 : chr20 ),1123801 ( chr2 : chr20 ),AMLM12020M-B ( chr2 : chr20 )", 
"AMLM12001KP ( chr4 : chr17 ),AMLM12001KP ( chr4 : chr17 ),1031901 ( chr4 : chr17 )"
)), .Names = c("freq", "customer_sample_id"), row.names = c(1L, 
2L, 3L), class = "data.frame")

result

     new.freq       uniq.customer_sample_id
1    5         AMLM12001KP ( chr2 : chr9 ),1028701 ( chr2 : chr9 ),1220901 ( chr2 : chr9 ),AMLM12015WPS ( chr2 : chr9 ),AML203 ( chr2 : chr9 )
2    3         AMLM12001KP ( chr2 : chr20 ),1123801 ( chr2 : chr20 ),AMLM12020M-B ( chr2 : chr20 )
3    2         AMLM12001KP ( chr4 : chr17 ),1031901 ( chr4 : chr17 )

akrun · Accepted Answer

We can use strsplit

 res <- do.call(rbind,lapply(strsplit(mydf[,2], ','), 
             function(x) {
    x1 <- unique(x)
    data.frame(new.freq=length(x1), uniq.customer_sample_id=toString(x1))}))


 res
  #new.freq                                                                                #                             uniq.customer_sample_id
#1        5 AMLM12001KP ( chr2 : chr9 ), 1028701 ( chr2 : chr9 ), 1220901 ( chr2 : chr9 ), AMLM12015WPS ( chr2 : chr9 ), AML203 ( chr2 : chr9 )
#2        3                                               AMLM12001KP ( chr2 : chr20 ), 1123801 ( chr2 : chr20 ), AMLM12020M-B ( chr2 : chr20 )
#3        2                                                                              #AMLM12001KP ( chr4 : chr17 ), 1031901 ( chr4 : chr17 )

How to remove and get the number of unique strings separated by comma in a column in R

Answers (1)

Related Questions