user8831872
user8831872

Reputation: 383

Invalid use of repetition operators

I try to reproduce this code for my case:

word_vec <- paste(c('bonkobuns ', 'exomunch ', 'calipodians ', 
          'relimited '), collapse="|")
 gsub(word_vec, '', df1$text)

However I receive this error:

Invalid use of repetition operators

The problem is with the following expressions:

c("c++", "c#", "vb.net", "objective-c")

How can I include them into the word list?

Upvotes: -2

Views: 646

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47330

@MrFlick's solution is the most idiomatic and efficient solution. Nevertheless, if we want to make it work with fixed= TRUE we could use Reduce :

Reduce(function(x,y) gsub(y,"",x,fixed=TRUE), wordVec, textVec)

# [1] "use the  tag" "use the  tag" "use the  tag" "use the  tag"

Upvotes: 2

MrFlick
MrFlick

Reputation: 206411

If you have

wordVec <- c("c++", "c#", "vb.net", "objective-c")

You will need to escape special characters like + specifically from the error message, but also things like . to be safe. Here we add a slash infront of those characters while building the expression.

wordList <- paste(gsub("([+.])","\\\\\\1", wordVec), collapse="|")
cat(wordList) # to remove extra string escapes
# c\+\+|c#|vb\.net|objective-c

And we can test with

textVec <- paste("use the", wordVec, "tag")
# [1] "use the c++ tag"         "use the c# tag"         
# [3] "use the vb.net tag"      "use the objective-c tag"
gsub(wordList, "", textVec)
# [1] "use the  tag" "use the  tag" "use the  tag" "use the  tag"

Upvotes: 2

divibisan
divibisan

Reputation: 12165

+ and . are special characters in regex. The particular error you're getting is from the ++: + means, match the preceding character 1 or more times. Trying to repeat a repeat character doesn't make sense, hence the error.

To match an actual + or . in regex, You need to escape them by putting backslashes \\ in front of them. Note that you need 2 backslashes in R since you have to escape the backslash too.

Example:

C++ should be written as C\\+\\+ or C\\+{2}

Upvotes: 1

Related Questions