Reputation: 4380
The regular expression pattern ^[A-Z]{2,4}$
specifies that the string to be matched should start with an uppercase letter and end with an uppercase letter. It also requires that there be exactly two, three, or four letters present. Anything else will not be considered valid:
filter_symbols <- function(symbols) {
valid <- regexpr("^[A-Z]{2,4}$", symbols)
return(sort(symbols[valid == 1]))
#valid
}
filter_symbols(c("MOT", "CVX", "123", "GOG2", "XLE", "AAPL", "AAPLS", "A"))
...and it works like a charm:
[1] "AAPL" "CVX" "MOT" "XLE"
Now when you test the same code here (and there are many similar online regex tester out there):
^[A-Z]{2,4}$
...you don't get any match (neither when you start the words in new lines each) - why is it behaving differently in both cases?
Upvotes: 1
Views: 88
Reputation: 70722
In Debuggex, no match results yield because you don't have the correct modifier turned on.
In most all regular expression engines, the anchors ^
and $
only match (respectively) at the beginning and the end of the string by default. If you want to match the begin/end of each line (not only begin/end of string), turn on the m
(multi-line) modifier which causes this behavior.
You can see the difference with this mode modifier being turned on — Debuggex Demo
Upvotes: 2
Reputation: 51330
By default, ^
matches at the start of the string, and $
matches only at the end.
Debbugex and other related sites pass the whole input textarea as a single input string, so your regex actually was being matched against MOT\ncvx\n123...AAPL
.
Enable the m
(multiline) flag - in this mode, ^
and $
will match the start/end of each line and it will enable you to test multiple inputs.
See the updated debuggex demo
Upvotes: 2