shoonya
shoonya

Reputation: 301

replace duplicate characters from strings

I am trying to remove duplicate character from strings.

dput(test)
c("APAAAAAAAAAAAPAAPPAPAPAAAAAAAAAAAAAAAAAAAAAAAAPPAPAAAAAAPPAPAAAPAPAAAAP", 
"AAA", "P", "P", "A", "P", "P", "APPPPPA", "A", "P", "AA", "PP", 
"PPA", "P", "P", "A", "P", "APAP", "P", "PA")

I create one function to sort the string

strSort <- function(x)
  sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")

Then i use gsub to remove consecutive characters

gsub("(.)\\1{2,}", "\\1", str_Sort(test))

This give out put as

gsub("(.)\\1{2,}", "\\1", strSort(test))
 [1] "AP"   "A"    "P"    "P"    "A"    "P"    "P"    "AAP"  "A"    "P"    "AA"   "PP"   "APP"  "P"    "P"    "A"    "P"    "AAPP" "P"    "AP"

Output should only have one A and/or one P.

Upvotes: 0

Views: 357

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388862

Using regex you can do :

gsub('(?:(.)(?=(.*)\\1))', '', test, perl = TRUE)

#[1] "AP" "A"  "P"  "P"  "A"  "P"  "P"  "PA" "A"  "P"  "A"  "P"  "PA"
#[14] "P"  "P"  "A"  "P"  "AP" "P"  "PA"

The regex has been taken from here.

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101189

Here is another option using utf8ToInt + intToUtf8

> sapply(test, function(x) intToUtf8(sort(unique(utf8ToInt(x)))), USE.NAMES = FALSE)
 [1] "AP" "A"  "P"  "P"  "A"  "P"  "P"  "AP" "A"  "P"  "A"  "P"  "AP" "P"  "P" 
[16] "A"  "P"  "AP" "P"  "AP"

Upvotes: 0

akrun
akrun

Reputation: 887028

In the strsplit output, we need to use unique on the sorted elements

sapply(strsplit(test, ""), function(x) 
       paste(unique(sort(x)), collapse=""))
#[1] "AP" "A"  "P"  "P"  "A"  "P"  "P"  "AP" "A"  "P"  "A"  "P"  "AP" "P"  "P"  "A"  "P"  "AP" "P"  "AP"

Upvotes: 1

Related Questions