riders994
riders994

Reputation: 1306

Removing repeating substrings from within a string in R

Is there any way (using regular expressions such as gsub or other means) to remove repetitions from a string?

Essentially:

a = c("abc, def, def, abc")
f(a)
#[1] "abc, def"

Upvotes: 1

Views: 207

Answers (3)

droopy
droopy

Reputation: 2818

you can also use this function based on gsub. I was not able to directly do it with a single regular expression.

f <- function(x) {
  x <- gsub("(.+)(.+)?\\1", "\\1\\2", x, perl=T)
  if (grepl("(.+)(.+)?\\1", x, perl=T))
    x <- f(x)
  else
    return(x)
}
b <- f(a)
b
[1] "abc, def"

hth

Upvotes: 0

dickoa
dickoa

Reputation: 18437

You can also use stringr::str_extract_all

require(stringr)  
unique(unlist(str_extract_all(a, '\\w+')))

Upvotes: 2

Arun
Arun

Reputation: 118789

One obvious way is to strsplit the string, get unique strings and stitch them together.

paste0(unique(strsplit(a, ",[ ]*")[[1]]), collapse=", ")

Upvotes: 3

Related Questions