Rich Scriven
Rich Scriven

Reputation: 99361

Can the pattern argument in ls() be inverted?

I'm trying to get a vector of all the function names in the base package that contain only a . as punctuation, or no punctuation at all. I'd like to do it using only the ls() function.

ls() takes a pattern argument that is defined as

an optional regular expression. Only names matching pattern are returned. glob2rx can be used to convert wildcard patterns to regular expressions.

I'm trying to invert my regular expression. But I also want to keep the functions that contain .. Here's an example of some of the ones I don't want.

lsBase1 <- ls("package:base", pattern = "[[:punct:]]")
head(lsBase1)
# [1] "^"   "~"   "<"   "<<-" "<="  "<-" 

I want the inverted version of this, as if I was using invert = TRUE in grep, or by doing the following. But I also want the functions that contain only . if they contain punctuation.

lsBase2 <- ls("package:base")
lsBase2 <- lsBase[!grepl("[[:punct:]]", lsBase)]
head(lsBase2)
# [1] "abbreviate"      "abs"             "acos"            "acosh"          
# [5] "addNA"           "addTaskCallback"

Is there a way to invert the pattern argument in ls()? Or, more generally can I invert the regular expression [[:punct:]] so it returns the opposite, but includes those matches that contain only . as punctuation?

Note: More than one . is fine.

Another example of what I want is: Yes I want is.vector but no I don't want [.data.frame.

Upvotes: 3

Views: 362

Answers (3)

hwnd
hwnd

Reputation: 70732

The following will work for what you are asking.

> lsBase2[grepl('^([^\\pP\\pS]|\\.)+$', lsBase2, perl=T)]

Edit: Or you could simply use the following (R version 3.1.1) returns 1029 results on this:

> ls("package:base", pattern="^[a-zA-Z0-9.]+$")

Upvotes: 3

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162411

I believe this is what you are looking for:

m <- ls("package:base", pattern="^(\\.|[^[:punct:]])*$")

The | is regex for "OR", so in words, it says something like "match a sequence of characters, running from the start of the string to its end, each of which is either a ., OR not a punctuation character".


To confirm that this works:

## Dissolve the matched strings and check for any verboten characters.  
sort(unique(unlist(strsplit(m, ""))))
#  [1] "." "0" "1" "2" "3" "4" "8" "a" "A" "b" "B" "c" "C" "d" "D" "e"
# [17] "E" "f" "F" "g" "G" "h" "H" "i" "I" "j" "J" "k" "K" "l" "L" "m"
# [33] "M" "n" "N" "o" "O" "p" "P" "q" "Q" "r" "R" "s" "S" "t" "T" "u"
# [49] "U" "v" "V" "w" "W" "x" "X" "y" "Y" "z"

## Have a look at (at least a few of) the names _excluded_ by the regex:
n <- setdiff(ls("package:base"), m)
sample(n, 10)
# [1] "names<-.POSIXlt" "[[<-.data.frame" "!.hexmode"       "$<-"            
# [5] "<-"              "&&"              "%*%"             "package_version"
# [9] "$"               "regmatches<-"   

Upvotes: 5

Matthew Lundberg
Matthew Lundberg

Reputation: 42679

This is easy if you think about it in steps. First remove the . characters, then scan for additional punctuation:

lsBase2[!grepl('[[:punct:]]', gsub('[.]', '', lsBase2))]

Upvotes: 0

Related Questions