DCRubyHound
DCRubyHound

Reputation: 343

remove duplicate entries in cell - R

I searched high and low on here, as well as tried duplicate and unique functions for what I'm about to ask, but couldn't get anything to work. Let's say I have a data frame named company with a variable state. When I collapse the rows I'm left with this output in one of the state variable observations:

PA;PA;PA;TX;TX

How could I remove the dups inside the cell (and entire vector for that matter), so it looks as follows:

PA;TX

I have no problems removing dup rows, but can't seem to do it for the cells themselves.

Upvotes: 3

Views: 2417

Answers (1)

ulfelder
ulfelder

Reputation: 5335

This works for a single string:

x <- "PA;PA;PA;TX;TX"

x2 <- strsplit(x, ";")

x3 <- unlist(x2)

x4 <- unique(x3)

x5 <- paste(x4, collapse = ";")

If you want to do it for the whole vector company$state, you could roll all that up into one call to sapply:

sapply(company$state, function(x) paste(unique(unlist(strsplit(x, ";"))), collapse = ";"))

Upvotes: 7

Related Questions