Reputation: 365
I have a dataframe with 3 columns I would like to search over. I have a list of strings I would like to search for in each column. I would like to return a dataframe with the original data, and a column for each string in the list of string and an indicator of if that string is found in that row's columns.
Here is a simplified version of something that approximates my data.
strings <- c("ape", "bear", "cat", "dog")
# A tibble: 7 x 3
snippet headline abstract
<chr> <chr> <chr>
1 this is an ape An ape some random
2 blah blah blah An ape some random
3 this is some random text some random text some ape stuff
4 this is a bear this is a bear bear time
5 some cat text bear time dog time
6 cat and dog text blah blah blah
7 blah blah blah this is just text blah
Output of dput(df):
dput(df)
structure(list(snippet = c("this is an ape", "blah blah blah",
"this is some random text", "this is a bear", "some cat text",
"cat and dog text", "blah blah blah"), headline = c("An ape",
"An ape", "some random text", "this is a bear", "bear time",
"blah blah", "this is just text"), abstract = c("some random",
"some random", "some ape stuff", "bear time", "dog time", "blah",
"blah")), .Names = c("snippet", "headline", "abstract"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -7L))
I would like it to return something like the following dataframe
# A tibble: 7 x 7
snippet headline abstract ape bear cat dog
<chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl>
1 this is an ape An ape some random TRUE FALSE FALSE FALSE
2 blah blah blah An ape some random TRUE FALSE FALSE FALSE
3 this is some random text some random text some ape stuff TRUE FALSE FALSE FALSE
4 this is a bear this is a bear bear time FALSE TRUE FALSE FALSE
5 some cat text bear time dog time FALSE TRUE TRUE FALSE
6 cat and dog text blah blah blah FALSE FALSE TRUE TRUE
7 blah blah blah this is just text blah FALSE FALSE FALSE FALSE
I have used grepl to return the rows needed but there is clearly a better way to do this and keep track of which string is hitting for which row
Thank you in advance for your help
Upvotes: 1
Views: 91
Reputation: 26258
As you don't need to specify which column the string is found in, you can collapse each row into a single string column, and search / grepl within that
something like
strings <- c("ape", "bear", "cat", "dog")
df$colStrings <- with(df, paste(snippet, headline, abstract, sep = ","))
sapply(strings, function(x) grepl(x, df$colStrings))
# ape bear cat dog
# [1,] TRUE FALSE FALSE FALSE
# [2,] TRUE FALSE FALSE FALSE
# [3,] TRUE FALSE FALSE FALSE
# [4,] FALSE TRUE FALSE FALSE
# [5,] FALSE TRUE TRUE TRUE
# [6,] FALSE FALSE TRUE TRUE
# [7,] FALSE FALSE FALSE FALSE
Upvotes: 5