Reputation: 383
I try to reproduce this code for my case:
word_vec <- paste(c('bonkobuns ', 'exomunch ', 'calipodians ',
'relimited '), collapse="|")
gsub(word_vec, '', df1$text)
However I receive this error:
Invalid use of repetition operators
The problem is with the following expressions:
c("c++", "c#", "vb.net", "objective-c")
How can I include them into the word list?
Upvotes: -2
Views: 646
Reputation: 47330
@MrFlick's solution is the most idiomatic and efficient solution. Nevertheless, if we want to make it work with fixed= TRUE
we could use Reduce
:
Reduce(function(x,y) gsub(y,"",x,fixed=TRUE), wordVec, textVec)
# [1] "use the tag" "use the tag" "use the tag" "use the tag"
Upvotes: 2
Reputation: 206411
If you have
wordVec <- c("c++", "c#", "vb.net", "objective-c")
You will need to escape special characters like +
specifically from the error message, but also things like .
to be safe. Here we add a slash infront of those characters while building the expression.
wordList <- paste(gsub("([+.])","\\\\\\1", wordVec), collapse="|")
cat(wordList) # to remove extra string escapes
# c\+\+|c#|vb\.net|objective-c
And we can test with
textVec <- paste("use the", wordVec, "tag")
# [1] "use the c++ tag" "use the c# tag"
# [3] "use the vb.net tag" "use the objective-c tag"
gsub(wordList, "", textVec)
# [1] "use the tag" "use the tag" "use the tag" "use the tag"
Upvotes: 2
Reputation: 12165
+
and .
are special characters in regex. The particular error you're getting is from the ++
: +
means, match the preceding character 1 or more times. Trying to repeat a repeat character doesn't make sense, hence the error.
To match an actual +
or .
in regex, You need to escape them by putting backslashes \\
in front of them. Note that you need 2 backslashes in R since you have to escape the backslash too.
Example:
C++
should be written as C\\+\\+
or C\\+{2}
Upvotes: 1