Robin Kohrs
Robin Kohrs

Reputation: 697

Inverting a regex in R

I have this string:

[1] "19980213"    "19980214"    "19980215"    "19980216"    "19980217"    "iffi"        "geometry"   
[8] "date_consid"

and I want to match all the elements that are not dates and not "date_consid". I tried

res =  grep("(?!\\d{8})|(?!date_consid)", vec, value=T)

But I just cant make it work...

Upvotes: 4

Views: 547

Answers (2)

The fourth bird
The fourth bird

Reputation: 163477

The pattern that you tried gives all the matches because the lookaheads are unanchored.

Using separate statements with or | will still match all strings.

You can change to logic to asserting from the start of the string, what is directly to the right is not either 8 digits or date_consid in a single check.

Using a positive lookahead, you have to add perl=T and add an anchor ^ to assert the start of the string and add an anchor $ to assert the end of the string after the lookahead.

 ^(?!\\d{8}$|date_consid$)
  • ^ Start of string
  • (?! Negative lookahead
    • \\d{8}$ Match 8 digits until end of string
    • | Or
    • date_consid$Match date_consid until end of string
  • ) Close lookahead

For example

vec <- c("19980213", "19980214", "19980215", "19980216","19980217", "iffi","geometry", "date_consid")
grep("^(?!\\d{8}$|date_consid$)", vec, value=T, perl=T)

Output

[1] "iffi"     "geometry"

Upvotes: 4

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627219

You can use

vec <- c("19980213", "19980214", "19980215", "19980216","19980217", "iffi","geometry", "date_consid")
grep("^(\\d{8}|date_consid)$", vec, value=TRUE, invert=TRUE)
## => [1] "iffi"     "geometry"

See the R demo

The ^(\d{8}|date_consid)$ regex matches a string that only consists of any eight digits or that is equal to date_consid.

The value=TRUE makes grep return values rather than indices and invert=TRUE inverses the regex match result (returns those that do not match).

Upvotes: 5

Related Questions