user3969377
user3969377

Reputation:

R: How to match regex but not substring

I have regex string data but would like to exclude a substring

dat <- c('long_regex_other_stuff','long_regex_other_random.something')
(dat[grep('long_regex',dat)])
(dat[grep('long_regex.*(?!.*something$)',dat)])

The first grep output is expected

"long_regex_other_stuff"            "long_regex_other_random.something"

How to get the second grep to work? The desired output is

"long_regex_other_stuff"

Ref: Regular expression to match a line that doesn't contain a word?

Upvotes: 2

Views: 131

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174816

You need to remove the preceding .* before the string something in the regex and add it after the negative lookahead,

> dat <- c('long_regex','long_regex.something')
> (dat[grep('long_regex(?!.*something).*',dat, perl=T)])
[1] "long_regex"
> (dat[grep('long_regex(?!.*\\bsomething\\b).*',dat, perl=T)])
[1] "long_regex"

long_regex(?!.*something) negative lookahead present in this regex asserts that there isn't a string something present after to the substring long_regex.

> dat <- c('long_regex_other_stuff','long_regex_other_random.something')
> (dat[grep('long_regex(?!.*\\bsomething\\b).*',dat, perl=T)])
[1] "long_regex_other_stuff"

Upvotes: 1

Related Questions