Joshua
Joshua

Reputation: 762

R: Detecting emoticons using regex

I have compiled a list of emoticons that I want to look for in a text. For example, the list of emoticons could be:

:)
:-(
):
:S
o_O
=D

And the text can be quite "difficult", that is, not all emoticons are separated by spaces:

text:S text=D. text :-(. text o_O text :)

How do I go about and replace these smilies with another string? I have tried to use some rather simple types go gsub()

emoticons <- c(":)",":-(","):",":S","o_O","=D")
texts <- "text:S text=D. text :-(. text o_O text :)"

for(x in 1:length(emoticons)) 
  texts2 <- gsub(emoticons[x], " XXX ", texts, fixed = TRUE)

But this doesn't go all the way, it only replaces some of the emoticons.

Upvotes: 1

Views: 580

Answers (1)

Pierre L
Pierre L

Reputation: 28461

Try adding backslashes to your emoticon patterns to disable meta-character effects. Then paste the patterns together for the regex search:

emoticons <- c(":\\)",":-\\(","\\):",":S","o_O","=D")
gsub(paste0(emoticons, collapse="|"), " XXX ", texts)
#[1] "text XXX  text XXX . text  XXX . text  XXX  text  XXX "

Upvotes: 2

Related Questions