Reputation: 274
I am trying to understand how the curly braces in R regular expression work. The help files say:
{n} The preceding item is matched exactly n times.
{n,} The preceding item is matched n or more times.
{n,m} The preceding item is matched at least n times, but not more than m times.
I have a vector like this:
b <- c("aa", "aaa", "aaaa", "aaaaa")
When I do
b[grep("a{2}", b)]
I would expect it to return only "aa" but instead I get everything. In other words, it yields exactly the same result as
b[grep("a{2,}", b)]
Why?
Upvotes: 1
Views: 1222
Reputation: 174786
Because in this aaa
input a{2}
matches the first two a
's likewise for all the other elements. So the grep returns index of all the elements. To do an exact string match, you must need to add anchors.
> b <- c("aa", "aaa", "aaaa", "aaaaa")
> b[grep("^a{2}$", b)]
[1] "aa"
^
asserts that we are at the start and $
asserts that we are at the end. So the above grep returns only the index of the element which has exactly two a
's ie, 1
.
OR
> b <- c("aa", "aaa", "aaaa", "aaaaa")
> b[grep("\\ba{2}\\b", b)]
[1] "aa"
Adding \b
word boundary will also works for this case.
Upvotes: 4