Reputation: 482
I need to detect rows of a df/tibble containing a specific sequence of characters.
seq <- "RT @AventusSystems"
is my sequence
df <- structure(list(text = c("@AventusSystems Wow, what a upgrade from help of investor",
"RT @AventusSystems: A recent article about our investors as shown in Forbes! t.co/n8oGwiEDpu #Aventus #GlobalAdvisors #4thefans #Ti…",
"@AventusSystems Very nice to have this project", "RT @AventusSystems: Join the #TicketRevolution with #Aventus today! #Aventus #TicketRevolution #AventCoin #4thefans t.co/OPlyCFmW4a"
), Tweet_Id = c("898359464444559360", "898359342952439809", "898359326552633345",
"898359268226736128"), created_at = structure(c(17396, 17396,
17396, 17396), class = "Date")), .Names = c("text", "Tweet_Id",
"created_at"), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
select(df, contains(seq))
# A tibble: 4 x 0
sapply(df$text, grepl, seq)
return only 4 FALSE
What do I wrong? What is the correct solution? Thank you for help
Upvotes: 0
Views: 49
Reputation: 436
First, grepl
is already vectorized over its argument x
, so you don't need sapply
. You could just do grepl(seq, df$text)
.
Why your code doesn't work is that sapply passes each element of the X
argument to the function in FUN
argument as the first argument (so you are looking for the search pattern "@AventusSystems Wow, what a upgrade from help of investor", etc. in your seq
object.
Lastly, dplyr::select
selects columns, whereas you want to use dplyr::filter
, which filters rows.
Upvotes: 2