generic
generic

Reputation: 301

R Regex for Postive Look-Around to Match Following

I have a dataframe in R. I want to match with and keep the row if

phrases_with_woman <- structure(list(phrase = c("woman get degree", "woman obtain justice", 
"session woman vote for member", "woman have to end", "woman have no existence", 
"woman lose right", "woman be much", "woman mix at dance", "woman vote as member", 
"woman have power", "woman act only", "she be woman", "no committee woman passed vote")), row.names = c(NA, 
-13L), class = "data.frame")

In the above example, I want to be able to match with all rows except for "she be woman."

This is my code so far. I have a positive look-around ((?<=woman\\s)\\w+") that seems to be on the right track, but it matches with too many preceding words. I tried using {1} to match with just one preceding word, but this syntax didn't work.

matches <- phrases_with_woman %>%
  filter(str_detect(phrase, "^woman|(?<=woman\\s)\\w+")) 

Help is appreciated.

Upvotes: 5

Views: 93

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269694

Each of the conditions can be an alternative although the last one requires two alternatives assuming that no/not/never can be either the first or second word.

library(dplyr)

pat <- "^(woman|\\w+ woman|\\w+ (no|not|never) woman|(no|not|never) \\w+ woman)\\b"
phrases_with_woman %>%
  filter(grepl(pat, phrase))

Upvotes: 6

Darren Tsai
Darren Tsai

Reputation: 35584

I haven't come up with a regex solution but here is a workaround.

library(dplyr)
library(stringr)

phrases_with_woman %>%
  filter(str_detect(word(phrase, 1, 2), "\\bwoman\\b") |
         (word(phrase, 3) == "woman" & str_detect(word(phrase, 1, 2), "\\b(no|not|never)\\b")))

#                            phrase
# 1                woman get degree
# 2            woman obtain justice
# 3   session woman vote for member
# 4               woman have to end
# 5         woman have no existence
# 6                woman lose right
# 7                   woman be much
# 8              woman mix at dance
# 9            woman vote as member
# 10               woman have power
# 11                 woman act only
# 12 no committee woman passed vote

Upvotes: 4

Related Questions