gabx
gabx

Reputation: 482

R: get dataframe row with specific characters

I need to detect rows of a df/tibble containing a specific sequence of characters.

seq <- "RT @AventusSystems" is my sequence

df <- structure(list(text = c("@AventusSystems Wow, what a upgrade from help of investor", 
"RT @AventusSystems: A recent article about our investors as shown in Forbes! t.co/n8oGwiEDpu #Aventus #GlobalAdvisors #4thefans #Ti…", 
"@AventusSystems Very nice to have this project", "RT @AventusSystems: Join the #TicketRevolution with #Aventus today! #Aventus #TicketRevolution #AventCoin #4thefans t.co/OPlyCFmW4a"
), Tweet_Id = c("898359464444559360", "898359342952439809", "898359326552633345", 
"898359268226736128"), created_at = structure(c(17396, 17396, 
17396, 17396), class = "Date")), .Names = c("text", "Tweet_Id", 
"created_at"), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))

select(df, contains(seq))
# A tibble: 4 x 0

sapply(df$text, grepl, seq) return only 4 FALSE

What do I wrong? What is the correct solution? Thank you for help

Upvotes: 0

Views: 49

Answers (1)

Taylor H
Taylor H

Reputation: 436

First, grepl is already vectorized over its argument x, so you don't need sapply. You could just do grepl(seq, df$text).

Why your code doesn't work is that sapply passes each element of the X argument to the function in FUN argument as the first argument (so you are looking for the search pattern "@AventusSystems Wow, what a upgrade from help of investor", etc. in your seq object.

Lastly, dplyr::select selects columns, whereas you want to use dplyr::filter, which filters rows.

Upvotes: 2

Related Questions