jan5
jan5

Reputation: 1179

Extracting specified word from a vector using R

I have a text e.g

text<- "i am happy today :):)"

I want to extract :) from text vector and report its frequency

Upvotes: 3

Views: 1301

Answers (3)

Sacha Epskamp
Sacha Epskamp

Reputation: 47551

I assume you only want the count, or do you also want to remove :) from the string?

For the count you can do:

length(gregexpr(":)",text)[[1]])

which gives 2. A more generalized solution for a vector of strings is:

sapply(gregexpr(":)",text),length)

Edit:

Josh O'Brien pointed out that this also returns 1 of there is no :) since gregexpr returns -1 in that case. To fix this you can use:

sapply(gregexpr(":)",text),function(x)sum(x>0))

Which does become slightly less pretty.

Upvotes: 3

BenBarnes
BenBarnes

Reputation: 19454

This does the trick but might not be the most direct way:

mytext<- "i am happy today :):)"

# The following line inserts semicolons to split on
myTextSub<-gsub(":)", ";:);", mytext)

# Then split and unlist
myTextSplit <- unlist(strsplit(myTextSub, ";"))

# Then see how many times the smiley turns up
length(grep(":)", myTextSplit))

EDIT

To handle vectors of text with length > 1, don't unlist:

mytext<- rep("i am happy today :):)",2)
myTextSub<-gsub(":\\)", ";:\\);", mytext)
myTextSplit <- strsplit(myTextSub, ";")

sapply(myTextSplit,function(x){
  length(grep(":)", x))
})

But I like the other answers better.

Upvotes: 1

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

Here's one idea, which would be easy to generalize:

text<- c("i was happy yesterday :):)",
         "i am happy today :)",
         "will i be happy tomorrow?")

(nchar(text) - nchar(gsub(":)", "", text))) / 2
# [1] 2 1 0

Upvotes: 5

Related Questions