Reputation: 404
Hi i am trying to find a short text in a sentence and then do some manipulation.It easy in java but in R i am having some issue.I am not reaching if condition. Here is my code
rm(list=ls())
library(tidytext)
library(dplyr)
shortText= c('grt','gr8','bcz','ur')
tweet=c('stats is gr8','this car is good','your movie is grt','i hate your book of hatred','food is awsome'
)
tweet=data.frame(tweet, stringsAsFactors = FALSE)
for(row in 1:nrow(tweet)) {
tweetWords=strsplit(tweet[row,]," ")
print(tweetWords)
for (word in 1:length(tweetWords)) {
if(tweetWords[word] %in% shortText){
print('we have a match')
}
}
Upvotes: 1
Views: 1085
Reputation: 4505
Could it be something like that:
cbind(tweet, ifelse(sapply(shortText, grepl, x = tweet), "Match is found", "No match"))
tweet grt gr8 bcz
[1,] "stats is gr8" "No match" "Match is found" "No match"
[2,] "this car is good" "No match" "No match" "No match"
[3,] "your movie is grt" "Match is found" "No match" "No match"
[4,] "i hate your book of hatred" "No match" "No match" "No match"
[5,] "food is awsome" "No match" "No match" "No match"
ur
[1,] "No match"
[2,] "No match"
[3,] "Match is found"
[4,] "Match is found"
[5,] "No match"
Upvotes: 0
Reputation: 522712
Here is a straightforward base R option using grepl
:
shortText <- c('grt','gr8','bcz','ur')
tweet <- c('stats is gr8','this car is good','your movie is grt','i hate your book of hatred','food is awsome')
res <- sapply(shortText, function(x) grepl(paste0("\\b", x, "\\b"), tweet))
tweet[rowSums(res)]
[1] "stats is gr8" "stats is gr8"
The basic idea is to generate a matrix whose rows are the tweets and whose columns are the keywords. Should we find one or more 1 (true) values across a given row, it means that tweet fired on one or more keywords.
Note carefully that I surround each search term by word boundaries \b
. This is necessary that a search term does not falsely match as a substring of a larger word.
Upvotes: 1
Reputation: 1233
There are many ways to improve this. But a quick solution with minimal changes to your code:
shortText= c('grt','gr8','bcz','ur')
tweet=c('stats is gr8','this car is good','your movie is grt','i hate your book of hatred','food is awsome'
)
tweet=data.frame(tweet, stringsAsFactors = FALSE)
for(row in 1:nrow(tweet)) {
tweetWords=strsplit(tweet[row,]," ")
print(tweetWords)
for (word in 1:length(tweetWords)) {
if(any(tweetWords[word][[1]] %in% shortText)){
print('we have a match')
}
}
}
returns:
[[1]]
[1] "stats" "is" "gr8"
[1] "we have a match"
[[1]]
[1] "this" "car" "is" "good"
[[1]]
[1] "your" "movie" "is" "grt"
[1] "we have a match"
[[1]]
[1] "i" "hate" "your" "book" "of" "hatred"
[[1]]
[1] "food" "is" "awsome"
Adding any
will execute the if
statement if any of the boolean operators are T, without it it would have used the first element only
Upvotes: 0