unitedsaga
unitedsaga

Reputation: 111

Extracting string from a vector without using a loop

I have a character vector from which I want to extract some strings. I can achieve it by using a loop but was wondering if same can be done without using one. I have included an example vector with the code that I have.

egVec = c("a - (2),bewc", "c,d,e","efd, ejw, qdn", "we3, asw - 23")

I want to extract the first element of each vector such that the required output will be:

Vec1
  [1] "a - (2)" "c" "efd" "we3"  

My code which uses a for loop:

Vec1 = as.character(0)
for (i in 1:length(egVec)){
  SplitVec = unlist(strsplit(egVec[i], ","))
  Vec1[i] = SplitVec[1]
}

Upvotes: 0

Views: 159

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78792

library(purrr)
library(stringi)

egVec <- c("a - (2),bewc", "c,d,e","efd, ejw, qdn", "we3, asw - 23")

strsplit(egVec, ",") %>%
  vapply(`[`, character(1), 1)                     # type-safe base R
## [1] "a - (2)" "c"       "efd"     "we3"

strsplit(egVec, ",") %>%
  sapply(`[`, 1)                                   # non-type-safe base R
## [1] "a - (2)" "c"       "efd"     "we3"

strsplit(egVec, ",") %>%
  map_chr(1)                                       # type-safe tidyvere
## [1] "a - (2)" "c"       "efd"     "we3"

stri_split_fixed(egVec, ",", 2, simplify=TRUE)[,1] # stringi one-liner splitting strings
## [1] "a - (2)" "c"       "efd"     "we3"

gsub(",.*$", "", egVec)                            # base R one-liner string replacing
## [1] "a - (2)" "c"       "efd"     "we3"

stri_replace_first_regex(egVec, ",.*$", "")        # stringi one-liner string replacing
## [1] "a - (2)" "c"       "efd"     "we3"

Benchmark:

library(microbenchmark)
library(ggplot2)

microbenchmark(
  vapply=strsplit(egVec, ",") %>% vapply(`[`, character(1), 1),
  sapply=strsplit(egVec, ",") %>% sapply(`[`, 1),
  map_chr=strsplit(egVec, ",") %>% map_chr(1),
  stri_split=stri_split_fixed(egVec, ",", 2, simplify=TRUE)[,1] ,
  gsub=gsub(",.*$", "", egVec),
  stri_replace=stri_replace_first_regex(egVec, ",.*$", "")
) -> mb

mb
## Unit: microseconds
##          expr     min       lq      mean   median       uq      max neval cld
##        vapply 109.657 140.6025 169.51454 159.9715 181.4645 1102.825   100   b
##        sapply 125.206 147.8225 176.49470 172.4420 196.8730  396.046   100   b
##       map_chr 123.767 145.7385 179.12090 177.9535 198.2710  325.098   100   b
##    stri_split   6.626  12.7120  15.60843  14.6755  17.6315   68.299   100  a 
##          gsub  13.912  20.5335  24.99184  23.8180  28.1800   45.563   100  a 
##  stri_replace  17.532  25.8590  30.81416  28.9465  31.0715  170.869   100  a

autoplot(mb)

enter image description here

(Not an optimal test harness for the benchmark but I figured stri_split… wld come out on top).

I'm also so used to using gsub() that I forgot to just use sub(). It has almost identical benchmarks to gsub() though. However, it's fairer to use sub() for the comparison to stri_replace_first_regex().

Upvotes: 4

Related Questions