Flora Grappelli
Flora Grappelli

Reputation: 679

Finding a word with condition in a vector with regex on R (perl)

I would like to find the rows in a vector with the word 'RT' in it or 'R' but not if the word 'RT' is preceded by 'no'.

The word RT may be preceded by nothing, a space, a dot, etc. With the regex, I tried :

grep("(?<=[no] )RT", aaa,ignore.case = FALSE, perl = T)

Which was giving me all the rows with "no RT".

and

grep("(?=[^no].*)RT",aaa , perl = T)

which was giving me all the rows containing 'RT' with and without 'no' at the beginning.

What is my mistake? I thought the ^ was giving everything but the character that follows it.

Example :

aaa = c("RT alone", "no RT", "CT/RT", "adj.RTx", "RT/CT", "lang, RT+","npo RT" )

Upvotes: 2

Views: 81

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627020

(?<=[no] )RT matches any RT that is immediately preceded with "n " or "o ".

You should use a negative lookbehind,

"(?<!no )RT"

See the regex demo.

Or, if you need to check for a whole word no,

"(?<!\\bno )RT"

See this regex demo.

Here, (?<!no ) makes sure there is no no immediately to the left of the current location, and only then RT is consumed.

Upvotes: 5

Related Questions