Reputation: 1306
I'm working with some character data in R, and I have some parts that have (foo)(foo) in the middle of the string. Is there anyway to automatically find those repetitions, and remove them (representing them as (foo) in the same position)?
I'm wondering if a possible solution is to use strsplit by ), and check if there is any equivalency, and then just reappend the ) back. Would this work?
Ex. string: "abc def (foo)(foo) abc def"
Upvotes: 1
Views: 165
Reputation: 2151
You could use a perl regular expression substitution within R as in the following example:
test <- "abc def (foo)(foo) abc def"
gsub('(\\(\\w+\\))\\1','\\1',test,perl=TRUE)
Alternatively, you can run a perl one-liner to clean the data beforehand:
echo "abc def (foo)(foo) abc def\n" | perl -ne 's/(\(\w+\))\1/$1/gi;print'
Upvotes: 3
Reputation: 2818
here a possibility to keep only the 1st repeated element in a sequence :
gsub("(.+)\\1+","\\1", x, perl=T)
HTH
Upvotes: 2