Rich Scriven
Rich Scriven

Reputation: 99331

Regular expression to keep some matches, remove others

I'm having a trouble with this regular expression. Consider the following vector.

> vec <- c("new jersey", "south dakota", "virginia:chincoteague",
           "washington:whidbey island", "new york:main")

Of those strings that contain a :, I would like to keep only the ones with main after :, resulting in

[1] "new jersey" "south dakota" "new york:main"

So far, I've only been able to get there with this ugly nested nightmare, which is quite obviously far from optimal.

> g1 <- grep(":", vec)
> vec[ -g1[grep("main", grep(":", vec, value = TRUE), invert = TRUE)] ]
# [1] "new jersey"    "south dakota"  "new york:main"

How can I write a single regular expression to keep :main but remove others containing : ?

Upvotes: 4

Views: 98

Answers (2)

zx81
zx81

Reputation: 41838

You can use this single simple regex:

^[^:]+(?::main.*)?$

See demo

Not sure about the exact R code, but something like

grepl("^[^:]+(?::main.*)?$", subject, perl=TRUE);

Explanation

  • The ^ anchor asserts that we are at the beginning of the string
  • The [^:]+ matches all chars that are not a colon
  • The optional non-capturing group (?::main.*)? matches a colon, main and any chars that follow
  • The $ anchor asserts that we are at the end of the string

Upvotes: 3

falsetru
falsetru

Reputation: 369094

Using | (Pick one that contains :main or that does not contains : at all):

> vec <- c("new jersey", "south dakota", "virginia:chincoteague",
+            "washington:whidbey island", "new york:main")
> grep(":main|^[^:]*$", vec)
[1] 1 2 5
> vec[grep(":main|^[^:]*$", vec)]
[1] "new jersey"    "south dakota"  "new york:main"

Upvotes: 6

Related Questions