Reputation:
I want to count the number of characters in a string (excluding spaces) and I'd like to know if my approach can be improved.
Suppose I have:
x <- "hello to you"
I know nchar()
will give me the number of characters in a string (including spaces):
> nchar(x)
[1] 12
But I'd like to return the following (excluding spaces):
[1] 10
To this end, I've done the following:
> nchar(gsub(" ", "",x))
[1] 10
My worry is the gsub()
will take a long time over many strings. Is this the correct way to approach this, or is there a type of nchar'esque function that will return the number of characters without counting spaces?
Thanks in advance.
Upvotes: 4
Views: 7288
Reputation: 5169
Indeed, stringi
seems most appropriate here. Try this:
library(stringi)
x <- "hello to you"
stri_stats_latex(x)
Result:
CharsWord CharsCmdEnvir CharsWhite Words Cmds Envirs
10 0 2 3 0 0
If you need it in a variable, you can access the parameters via regular [i], e.g.:
stri_stats_latex(x)[1]
Upvotes: 2
Reputation: 193517
Building on Richard's comment, "stringi" would be a great consideration here:
The approach could be to calculate the overall string length and subtract the number of spaces.
Compare the following.
library(stringi)
library(microbenchmark)
x <- "hello to you"
x
# [1] "hello to you"
fun1 <- function(x) stri_length(x) - stri_count_fixed(x, " ")
fun2 <- function(x) nchar(gsub(" ", "",x))
y <- paste(as.vector(replicate(1000000, x, TRUE)), collapse = " ")
microbenchmark(fun1(x), fun2(x))
# Unit: microseconds
# expr min lq mean median uq max neval
# fun1(x) 5.560 5.988 8.65163 7.270 8.1255 44.047 100
# fun2(x) 9.408 9.837 12.84670 10.691 12.4020 57.732 100
microbenchmark(fun1(y), fun2(y), times = 10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# fun1(y) 68.22904 68.50273 69.6419 68.63914 70.47284 75.17682 10
# fun2(y) 2009.14710 2011.05178 2042.8123 2030.10502 2079.87224 2090.09142 10
Upvotes: 8