Reputation: 111
I have a character vector from which I want to extract some strings. I can achieve it by using a loop but was wondering if same can be done without using one. I have included an example vector with the code that I have.
egVec = c("a - (2),bewc", "c,d,e","efd, ejw, qdn", "we3, asw - 23")
I want to extract the first element of each vector such that the required output will be:
Vec1
[1] "a - (2)" "c" "efd" "we3"
My code which uses a for loop:
Vec1 = as.character(0)
for (i in 1:length(egVec)){
SplitVec = unlist(strsplit(egVec[i], ","))
Vec1[i] = SplitVec[1]
}
Upvotes: 0
Views: 159
Reputation: 78792
library(purrr)
library(stringi)
egVec <- c("a - (2),bewc", "c,d,e","efd, ejw, qdn", "we3, asw - 23")
strsplit(egVec, ",") %>%
vapply(`[`, character(1), 1) # type-safe base R
## [1] "a - (2)" "c" "efd" "we3"
strsplit(egVec, ",") %>%
sapply(`[`, 1) # non-type-safe base R
## [1] "a - (2)" "c" "efd" "we3"
strsplit(egVec, ",") %>%
map_chr(1) # type-safe tidyvere
## [1] "a - (2)" "c" "efd" "we3"
stri_split_fixed(egVec, ",", 2, simplify=TRUE)[,1] # stringi one-liner splitting strings
## [1] "a - (2)" "c" "efd" "we3"
gsub(",.*$", "", egVec) # base R one-liner string replacing
## [1] "a - (2)" "c" "efd" "we3"
stri_replace_first_regex(egVec, ",.*$", "") # stringi one-liner string replacing
## [1] "a - (2)" "c" "efd" "we3"
Benchmark:
library(microbenchmark)
library(ggplot2)
microbenchmark(
vapply=strsplit(egVec, ",") %>% vapply(`[`, character(1), 1),
sapply=strsplit(egVec, ",") %>% sapply(`[`, 1),
map_chr=strsplit(egVec, ",") %>% map_chr(1),
stri_split=stri_split_fixed(egVec, ",", 2, simplify=TRUE)[,1] ,
gsub=gsub(",.*$", "", egVec),
stri_replace=stri_replace_first_regex(egVec, ",.*$", "")
) -> mb
mb
## Unit: microseconds
## expr min lq mean median uq max neval cld
## vapply 109.657 140.6025 169.51454 159.9715 181.4645 1102.825 100 b
## sapply 125.206 147.8225 176.49470 172.4420 196.8730 396.046 100 b
## map_chr 123.767 145.7385 179.12090 177.9535 198.2710 325.098 100 b
## stri_split 6.626 12.7120 15.60843 14.6755 17.6315 68.299 100 a
## gsub 13.912 20.5335 24.99184 23.8180 28.1800 45.563 100 a
## stri_replace 17.532 25.8590 30.81416 28.9465 31.0715 170.869 100 a
autoplot(mb)
(Not an optimal test harness for the benchmark but I figured stri_split…
wld come out on top).
I'm also so used to using gsub()
that I forgot to just use sub()
. It has almost identical benchmarks to gsub()
though. However, it's fairer to use sub()
for the comparison to stri_replace_first_regex()
.
Upvotes: 4