Reputation: 19648
The goal is to remove all non-capital letter in a string and I managed to find a regular expression solution without fully understanding it.
> gsub("[^::A-Z::]","", "PendingApproved")
[1] "PA"
I tried to read the documentation of regex in R but the double colon isn't really covered there.
[]
includes characters to match in regex, A-Z
means upper case and ^
means not, can someone help me understand what are the double colons there?
Upvotes: 4
Views: 965
Reputation: 887531
We can use str_remove
from stringr
library(stringr)
str_remove_all("PendingApproved", "[a-z]+")
#[1] "PA"
Upvotes: 1
Reputation: 133630
To remove all small letters use following:
gsub("[a-z]","", "PendingApproved")
^
denotes only starting characters so
gsub("^[a-z]","", "PendingApproved")
will not remove any letters from your tested string because your string don't have any small letters in starting of it.
EDIT: As per Tim's comment adding negation's work in character class too here. So let's say we want to remove all digits in a given value among alphabets and digits then following may help.
gsub("[^[:alpha:]]","", "PendingApproved1213133")
Where it is telling gsub
then DO NOT substitute alphabets in this process. ^
works as negation in character class.
Upvotes: 2
Reputation: 522074
As far as I know, you don't need those double colons:
gsub("[^A-Z]", "", "PendingApproved")
[1] "PA"
Your current pattern says to remove any character which is not A-Z
or colon :
. The fact that you repeat the colons twice, on each side of the character range, does not add any extra logic.
Perhaps the author of the code you are using confounded the double colons with R's regex own syntax for named character classes. For example, we could have written the above as:
gsub("[^[:upper:]]","", "PendingApproved")
where [:upper:]
means all upper case letters.
Upvotes: 4