Count characters in a string (excluding spaces) in R?

Question

I want to count the number of characters in a string (excluding spaces) and I'd like to know if my approach can be improved.

Suppose I have:

x <- "hello to you"

I know nchar() will give me the number of characters in a string (including spaces):

> nchar(x)
[1] 12

But I'd like to return the following (excluding spaces):

[1] 10

To this end, I've done the following:

> nchar(gsub(" ", "",x))
[1] 10

My worry is the gsub() will take a long time over many strings. Is this the correct way to approach this, or is there a type of nchar'esque function that will return the number of characters without counting spaces?

Thanks in advance.

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

Building on Richard's comment, "stringi" would be a great consideration here:

The approach could be to calculate the overall string length and subtract the number of spaces.

Compare the following.

library(stringi)
library(microbenchmark)

x <- "hello to you"
x
# [1] "hello to you"
fun1 <- function(x) stri_length(x) - stri_count_fixed(x, " ")
fun2 <- function(x) nchar(gsub(" ", "",x))
y <- paste(as.vector(replicate(1000000, x, TRUE)), collapse = "     ")

microbenchmark(fun1(x), fun2(x))
# Unit: microseconds
#     expr   min    lq     mean median      uq    max neval
#  fun1(x) 5.560 5.988  8.65163  7.270  8.1255 44.047   100
#  fun2(x) 9.408 9.837 12.84670 10.691 12.4020 57.732   100
microbenchmark(fun1(y), fun2(y), times = 10)
# Unit: milliseconds
#     expr        min         lq      mean     median         uq        max neval
#  fun1(y)   68.22904   68.50273   69.6419   68.63914   70.47284   75.17682    10
#  fun2(y) 2009.14710 2011.05178 2042.8123 2030.10502 2079.87224 2090.09142    10

Count characters in a string (excluding spaces) in R?

Answers (2)

Related Questions