Reputation: 301
I am trying to remove duplicate character from strings.
dput(test)
c("APAAAAAAAAAAAPAAPPAPAPAAAAAAAAAAAAAAAAAAAAAAAAPPAPAAAAAAPPAPAAAPAPAAAAP",
"AAA", "P", "P", "A", "P", "P", "APPPPPA", "A", "P", "AA", "PP",
"PPA", "P", "P", "A", "P", "APAP", "P", "PA")
I create one function to sort the string
strSort <- function(x)
sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")
Then i use gsub to remove consecutive characters
gsub("(.)\\1{2,}", "\\1", str_Sort(test))
This give out put as
gsub("(.)\\1{2,}", "\\1", strSort(test))
[1] "AP" "A" "P" "P" "A" "P" "P" "AAP" "A" "P" "AA" "PP" "APP" "P" "P" "A" "P" "AAPP" "P" "AP"
Output should only have one A and/or one P.
Upvotes: 0
Views: 357
Reputation: 388862
Using regex you can do :
gsub('(?:(.)(?=(.*)\\1))', '', test, perl = TRUE)
#[1] "AP" "A" "P" "P" "A" "P" "P" "PA" "A" "P" "A" "P" "PA"
#[14] "P" "P" "A" "P" "AP" "P" "PA"
The regex has been taken from here.
Upvotes: 2
Reputation: 101189
Here is another option using utf8ToInt
+ intToUtf8
> sapply(test, function(x) intToUtf8(sort(unique(utf8ToInt(x)))), USE.NAMES = FALSE)
[1] "AP" "A" "P" "P" "A" "P" "P" "AP" "A" "P" "A" "P" "AP" "P" "P"
[16] "A" "P" "AP" "P" "AP"
Upvotes: 0
Reputation: 887028
In the strsplit
output, we need to use unique
on the sort
ed elements
sapply(strsplit(test, ""), function(x)
paste(unique(sort(x)), collapse=""))
#[1] "AP" "A" "P" "P" "A" "P" "P" "AP" "A" "P" "A" "P" "AP" "P" "P" "A" "P" "AP" "P" "AP"
Upvotes: 1