Reputation: 19648

Double Colon in R Regular Expression

The goal is to remove all non-capital letter in a string and I managed to find a regular expression solution without fully understanding it.

> gsub("[^::A-Z::]","", "PendingApproved")
[1] "PA"

I tried to read the documentation of regex in R but the double colon isn't really covered there.

[]includes characters to match in regex, A-Z means upper case and ^ means not, can someone help me understand what are the double colons there?

Upvotes: 4

Answers (3)

akrun

Reputation: 887531

We can use str_remove from stringr

library(stringr)
str_remove_all("PendingApproved", "[a-z]+")
#[1] "PA"

Upvotes: 1

RavinderSingh13

Reputation: 133630

To remove all small letters use following:

gsub("[a-z]","", "PendingApproved")

^ denotes only starting characters so

gsub("^[a-z]","", "PendingApproved")

will not remove any letters from your tested string because your string don't have any small letters in starting of it.

EDIT: As per Tim's comment adding negation's work in character class too here. So let's say we want to remove all digits in a given value among alphabets and digits then following may help.

gsub("[^[:alpha:]]","", "PendingApproved1213133")

Where it is telling gsub then DO NOT substitute alphabets in this process. ^ works as negation in character class.

Upvotes: 2

Tim Biegeleisen

Reputation: 522074

As far as I know, you don't need those double colons:

gsub("[^A-Z]", "", "PendingApproved")
[1] "PA"

Your current pattern says to remove any character which is not A-Z or colon :. The fact that you repeat the colons twice, on each side of the character range, does not add any extra logic.

Perhaps the author of the code you are using confounded the double colons with R's regex own syntax for named character classes. For example, we could have written the above as:

gsub("[^[:upper:]]","", "PendingApproved")

where [:upper:] means all upper case letters.

Demo

Upvotes: 4

Double Colon in R Regular Expression

Answers (3)

Demo

Related Questions