Reputation: 1748
Here is some data.
library(stringr)
library(dplyr)
df <- tibble(sentences)
I want to identify all sentences with the word "her." But this, of course, also returns sentences with words like "there" and "here."
df %>% filter(str_detect(sentences, "her"))
# A tibble: 43 x 1
sentences
<chr>
1 The boy was there when the sun rose.
2 Help the woman get back to her feet.
3 What joy there is in living.
4 There are more than two factors here.
5 Cats and dogs each hate the other.
6 The wharf could be seen at the farther shore.
7 The tiny girl took off her hat.
8 Write a fond note to the friend you cherish.
9 There was a sound of dry leaves outside.
10 Add the column and put the sum here.
The documentation for stringr::str_detect
says, "Match character, word, line and sentence boundaries with boundary()
." I can't figure out how to do this, nor can I find an example anywhere. All of the documentation examples involve the str_split
or str_count
functions.
My question is related to this question, but I would specifically like to understand how to use the stringr::boundary
function.
Upvotes: 1
Views: 792
Reputation: 887951
We can specify the word boundary (\\b
) at the start and end to avoid any partial matches
library(stringr)
library(dplyr)
df %>%
filter(str_detect(sentences, "\\bher\\b"))
# sentences
#1 Help the woman get back to her feet.
#2 The tiny girl took off her hat.
Or use boundary
to wrap
df %>%
filter(str_detect(sentences, boundary("her")))
Upvotes: 3