rar20
rar20

Reputation: 51

R Character classes

Could anybody explain why "aba12" shows up, when I have specified {2}?

strings=c("Ab12","aba12","BA12","A 12b","B!","d", "  ab")

grep("^[[:alpha:]]{2}", strings, value=TRUE)

Upvotes: 3

Views: 105

Answers (1)

Frank
Frank

Reputation: 66819

You can use ...

grep("^[[:alpha:]]{2}[^[:alpha:]]", strings, value=TRUE)

# [1] "Ab12" "BA12"

[...] enumerates accepted characters and [^...] negates it. Further, from @Mako212:

^[[:alpha:]]{2} [...] tells the Regex engine to match the beginning of the string, then exactly two ASCII A-Z/a-z characters. It asserts nothing about the remainder of the string. Regex will process the remainder of the string, but there is no remaining criteria to match

My answer above expects a non-alpha character following the initial two. From MrFlick's comment:

If you also want to match "AB", then use

grep("^[[:alpha:]]{2}([^[:alpha:]]|$)", strings, value=TRUE) 

to match a non-alpha character or end of string.

Upvotes: 3

Related Questions