Pixelkracht
Pixelkracht

Reputation: 274

How do curly braces in R regex work?

I am trying to understand how the curly braces in R regular expression work. The help files say:

{n} The preceding item is matched exactly n times.

{n,} The preceding item is matched n or more times.

{n,m} The preceding item is matched at least n times, but not more than m times.

I have a vector like this:

b <- c("aa", "aaa", "aaaa", "aaaaa")

When I do

b[grep("a{2}", b)]

I would expect it to return only "aa" but instead I get everything. In other words, it yields exactly the same result as

b[grep("a{2,}", b)]

Why?

Upvotes: 1

Views: 1222

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174786

Because in this aaa input a{2} matches the first two a's likewise for all the other elements. So the grep returns index of all the elements. To do an exact string match, you must need to add anchors.

> b <- c("aa", "aaa", "aaaa", "aaaaa")
> b[grep("^a{2}$", b)]
[1] "aa"

^ asserts that we are at the start and $ asserts that we are at the end. So the above grep returns only the index of the element which has exactly two a's ie, 1.

OR

> b <- c("aa", "aaa", "aaaa", "aaaaa")
> b[grep("\\ba{2}\\b", b)]
[1] "aa"

Adding \b word boundary will also works for this case.

Upvotes: 4

Related Questions