Lorenz Walthert
Lorenz Walthert

Reputation: 4639

matching start of a string but not end in R

How can I match all words starting with plan_ and not ending with template without using invert = TRUE? In the below example, I'd like to match only the second string. I tried with negative lookahead but it does not work, maybe because of greediness?

names <- c("plan_x_template", "plan_x")
grep("^plan.*(?!template)$", 
  names, 
  value = TRUE, perl = TRUE
)
#> [1] "plan_x_template" "plan_x"    

I mean one can also solve the problem with two regex calls but I'd like to see how it works the other way :-)

is_plan <- grepl("^plan_", names)
is_template <- grepl("_template$", names)
names[is_plan & !is_template]
#> [1] "plan_x"

Upvotes: 3

Views: 1147

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use

names <- c("plan_x_template", "plan_x")
grep("^plan(?!.*template)", 
  names, 
  value = TRUE, perl = TRUE
)

See the R online demo

The ^plan(?!.*template) pattern matches:

  • ^ - a start of string
  • plan - a plan substring
  • (?!.*template) - a negative lookahead that fails the match if, immediately to the left of the current location, there are 0+ chars other than line break chars (since perl = TRUE is used and the pattern is processed with a PCRE engine, the . does not match all possible chars as opposed to the default grep TRE regex engine), as many as possible, followed with template substring.

NOTE: In case of multiline strings, you need to use a DOTALL modifier in the regex, "(?s)^plan(?!.*template)".

Upvotes: 5

Related Questions