John J.
John J.

Reputation: 1748

how to use boundary with str_detect (tidyr package)

Here is some data.

library(stringr)
library(dplyr)

df <- tibble(sentences)

I want to identify all sentences with the word "her." But this, of course, also returns sentences with words like "there" and "here."

df %>% filter(str_detect(sentences, "her"))
# A tibble: 43 x 1
   sentences                                    
   <chr>                                        
 1 The boy was there when the sun rose.         
 2 Help the woman get back to her feet.         
 3 What joy there is in living.                 
 4 There are more than two factors here.        
 5 Cats and dogs each hate the other.           
 6 The wharf could be seen at the farther shore.
 7 The tiny girl took off her hat.              
 8 Write a fond note to the friend you cherish. 
 9 There was a sound of dry leaves outside.     
10 Add the column and put the sum here. 

The documentation for stringr::str_detect says, "Match character, word, line and sentence boundaries with boundary()." I can't figure out how to do this, nor can I find an example anywhere. All of the documentation examples involve the str_split or str_count functions.

My question is related to this question, but I would specifically like to understand how to use the stringr::boundary function.

Upvotes: 1

Views: 792

Answers (1)

akrun
akrun

Reputation: 887951

We can specify the word boundary (\\b) at the start and end to avoid any partial matches

library(stringr)
library(dplyr)
df %>% 
    filter(str_detect(sentences, "\\bher\\b"))
#                             sentences
#1 Help the woman get back to her feet.
#2      The tiny girl took off her hat.

Or use boundary to wrap

df %>%
      filter(str_detect(sentences, boundary("her")))

Upvotes: 3

Related Questions