Nico.444
Nico.444

Reputation: 77

apply strsplit to multiple columns

I want to get rid of duplicates in strings that are separated by commas.

It works for a single column using:

df$column  <- sapply(strsplit(df$column, ",", fixed = TRUE), function(x) 
                                           paste(unique(x), collapse = ","))

When I try to use it on multiple columns I always get an "argument is a non-character" error.

Upvotes: 2

Views: 732

Answers (1)

akrun
akrun

Reputation: 887148

We need to wrap with as.character if the column is factor

sapply(strsplit(as.character(df$column), ",", fixed = TRUE),
      function(x) paste(unique(x), collapse = ","))

For applying to multiple columns loop through the columns of interest, apply the same function and update the output to the columns of interest

colsOfInterest <- c('column1', 'column2')
df[colsOfInterest] <- lapply(df[colsOfInterest], function(x) 
  sapply(strsplit(as.character(x), ",", fixed = TRUE),
       function(y) paste(unique(y), collapse=",")))

Upvotes: 3

Related Questions