Reputation: 99331
I'm having a trouble with this regular expression. Consider the following vector.
> vec <- c("new jersey", "south dakota", "virginia:chincoteague",
"washington:whidbey island", "new york:main")
Of those strings that contain a :
, I would like to keep only the ones with main
after :
, resulting in
[1] "new jersey" "south dakota" "new york:main"
So far, I've only been able to get there with this ugly nested nightmare, which is quite obviously far from optimal.
> g1 <- grep(":", vec)
> vec[ -g1[grep("main", grep(":", vec, value = TRUE), invert = TRUE)] ]
# [1] "new jersey" "south dakota" "new york:main"
How can I write a single regular expression to keep :main
but remove others containing :
?
Upvotes: 4
Views: 98
Reputation: 41838
You can use this single simple regex:
^[^:]+(?::main.*)?$
See demo
Not sure about the exact R code, but something like
grepl("^[^:]+(?::main.*)?$", subject, perl=TRUE);
Explanation
^
anchor asserts that we are at the beginning of the string[^:]+
matches all chars that are not a colon(?::main.*)?
matches a colon, main
and any chars that follow$
anchor asserts that we are at the end of the stringUpvotes: 3
Reputation: 369094
Using |
(Pick one that contains :main
or that does not contains :
at all):
> vec <- c("new jersey", "south dakota", "virginia:chincoteague",
+ "washington:whidbey island", "new york:main")
> grep(":main|^[^:]*$", vec)
[1] 1 2 5
> vec[grep(":main|^[^:]*$", vec)]
[1] "new jersey" "south dakota" "new york:main"
Upvotes: 6