riders994
riders994

Reputation: 1306

String Editing in R - Taking Out Repetition

I'm working with some character data in R, and I have some parts that have (foo)(foo) in the middle of the string. Is there anyway to automatically find those repetitions, and remove them (representing them as (foo) in the same position)?

I'm wondering if a possible solution is to use strsplit by ), and check if there is any equivalency, and then just reappend the ) back. Would this work?

Ex. string: "abc def (foo)(foo) abc def"

Upvotes: 1

Views: 165

Answers (2)

Itamar
Itamar

Reputation: 2151

You could use a perl regular expression substitution within R as in the following example:

test <- "abc def (foo)(foo) abc def"
gsub('(\\(\\w+\\))\\1','\\1',test,perl=TRUE)

Alternatively, you can run a perl one-liner to clean the data beforehand:

echo "abc def (foo)(foo) abc def\n" | perl -ne 's/(\(\w+\))\1/$1/gi;print'

Upvotes: 3

droopy
droopy

Reputation: 2818

here a possibility to keep only the 1st repeated element in a sequence :

gsub("(.+)\\1+","\\1", x, perl=T)

HTH

Upvotes: 2

Related Questions