Reputation: 99361
I'm trying to get a vector of all the function names in the base package that contain only a .
as punctuation, or no punctuation at all. I'd like to do it using only the ls()
function.
ls()
takes a pattern
argument that is defined as
an optional regular expression. Only names matching pattern are returned.
glob2rx
can be used to convert wildcard patterns to regular expressions.
I'm trying to invert my regular expression. But I also want to keep the functions that contain .
. Here's an example of some of the ones I don't want.
lsBase1 <- ls("package:base", pattern = "[[:punct:]]")
head(lsBase1)
# [1] "^" "~" "<" "<<-" "<=" "<-"
I want the inverted version of this, as if I was using invert = TRUE
in grep
, or by doing the following. But I also want the functions that contain only .
if they contain punctuation.
lsBase2 <- ls("package:base")
lsBase2 <- lsBase[!grepl("[[:punct:]]", lsBase)]
head(lsBase2)
# [1] "abbreviate" "abs" "acos" "acosh"
# [5] "addNA" "addTaskCallback"
Is there a way to invert the pattern
argument in ls()
? Or, more generally can I invert the regular expression [[:punct:]]
so it returns the opposite, but includes those matches that contain only .
as punctuation?
Note: More than one .
is fine.
Another example of what I want is: Yes I want is.vector
but no I don't want [.data.frame
.
Upvotes: 3
Views: 362
Reputation: 70732
The following will work for what you are asking.
> lsBase2[grepl('^([^\\pP\\pS]|\\.)+$', lsBase2, perl=T)]
Edit: Or you could simply use the following (R version 3.1.1) returns 1029
results on this:
> ls("package:base", pattern="^[a-zA-Z0-9.]+$")
Upvotes: 3
Reputation: 162411
I believe this is what you are looking for:
m <- ls("package:base", pattern="^(\\.|[^[:punct:]])*$")
The |
is regex for "OR", so in words, it says something like "match a sequence of characters, running from the start of the string to its end, each of which is either a .
, OR not a punctuation character".
To confirm that this works:
## Dissolve the matched strings and check for any verboten characters.
sort(unique(unlist(strsplit(m, ""))))
# [1] "." "0" "1" "2" "3" "4" "8" "a" "A" "b" "B" "c" "C" "d" "D" "e"
# [17] "E" "f" "F" "g" "G" "h" "H" "i" "I" "j" "J" "k" "K" "l" "L" "m"
# [33] "M" "n" "N" "o" "O" "p" "P" "q" "Q" "r" "R" "s" "S" "t" "T" "u"
# [49] "U" "v" "V" "w" "W" "x" "X" "y" "Y" "z"
## Have a look at (at least a few of) the names _excluded_ by the regex:
n <- setdiff(ls("package:base"), m)
sample(n, 10)
# [1] "names<-.POSIXlt" "[[<-.data.frame" "!.hexmode" "$<-"
# [5] "<-" "&&" "%*%" "package_version"
# [9] "$" "regmatches<-"
Upvotes: 5
Reputation: 42679
This is easy if you think about it in steps. First remove the .
characters, then scan for additional punctuation:
lsBase2[!grepl('[[:punct:]]', gsub('[.]', '', lsBase2))]
Upvotes: 0