Reputation: 186
I have 2 dataframes in R and I want to do a query using the dataframe "y" like parameter to dataframe "x".
I have this code:
x <- c('The book is on the table','I hear birds outside','The electricity
came back')
x <- data.frame(x)
colnames(x) <- c('text')
x
y <- c('book','birds','electricity')
y <- data.frame(y)
colnames(y) <- c('search')
y
r <- sqldf("select * from x where text IN (select search from y)")
r
I think to use "like" here, but I don´t know. Can you helpme ?
Upvotes: 2
Views: 659
Reputation: 2826
If you want a sqldf
solution, I think that this would work:
sqldf("select x.text, y.search FROM x JOIN y on x.text LIKE '%' || y.search || '%'")
## text search
## 1 The book is on the table book
## 2 I hear birds outside birds
## 3 The electricity \ncame back electricity
Upvotes: 2
Reputation: 3274
It's hard to know if this is what you want without a more varied fixture. To add a little bit of variation, I added an extra word to y$search
- y = c('book','birds','electricity', 'cat')
. More variation would further clarify
Just know which words are in which statements? sapply
and grepl
> m = sapply(y$search, grepl, x$text)
> rownames(m) = x$text
> colnames(m) = y$search
> m
book birds electricity cat
The book is on the table TRUE FALSE FALSE FALSE
I hear birds outside FALSE TRUE FALSE FALSE
The electricity \ncame back FALSE FALSE TRUE FALSE
Pulling out just the matching rows?
> library(magrittr) # To use the pipe, "%>%"
> x %>% data.table::setDT() # To return the result as a table easily
>
> x[(sapply(y$search, grepl, x$text) %>% rowSums() %>% as.logical()) * (1:nrow(x)), ]
text
1: The book is on the table
2: I hear birds outside
3: The electricity \ncame back
@Aurèle's solution will give the best result for matching text and the text it match to. Note that if back
was also in y$search
, the text The electricity \ncame back
would get reported twice in the result for the different search terms matched, so this is better in the case that uniqueness is not important.
So it largely depends on your desired output.
Upvotes: 0
Reputation: 12819
You could use the fuzzyjoin
package:
library(dplyr)
library(fuzzyjoin)
regex_join(
mutate_if(x, is.factor, as.character),
mutate_if(y, is.factor, as.character),
by = c("text" = "search")
)
# text search
# 1 The book is on the table book
# 2 I hear birds outside birds
# 3 The electricity \ncame back electricity
Upvotes: 0