Reputation: 27
I have a character vector
a=c("Mom", "mother", "Alex", "Betty", "Prime Minister")
I want to extract words starting with "M" only (upper and lower both)
How to do this?
I have tried using grep()
, sub()
and other variants of this function but I am not getting it right.
I expect the output to be a character vector of "Mom" and "mother"
Upvotes: 0
Views: 2608
Reputation: 27792
plain grep
will also do just fine
grep( "^m", a, ignore.case = TRUE, value = TRUE )
#[1] "Mom" "mother"
benchmarks
tom's answer (startsWith) is the winner, but there is some room for improvement (check startsWith2
's code)
microbenchmark::microbenchmark(
substr = a[substr(a, 1, 1) %in% c("M", "m")],
grepl = a[grepl("^[Mm]", a)],
grep = grep( "^m", a, ignore.case = TRUE, value = TRUE ),
stringr = unlist(stringr::str_extract_all(a,regex("^M.*",ignore_case = T))),
startsWith1 = a[startsWith(toupper(a), "M")],
startsWith2= a[startsWith(a, c("M", "m"))]
)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# substr 1808 2411.0 3323.19 3314 3917 8435 100
# grepl 3916 4218.0 5438.06 4820 6930 8436 100
# grep 3615 4368.5 5450.10 4820 6929 19582 100
# stringr 50913 53023.0 55764.10 54529 55132 174432 100
# startsWith1 1506 2109.0 2814.11 2711 3013 17474 100
# startsWith2 602 1205.0 1410.17 1206 1507 3013 100
Upvotes: 2
Reputation: 33743
substr
is a very tractable base R function:
a[substr(a, 1, 1) %in% c("M", "m")]
# [1] "Mom" "mother"
And since you mentioned sub()
then you could do (not necessarily recommended though):
a[sub("(.).*", "\\1", a) %in% c("M", "m")]
Upvotes: 0
Reputation: 522762
Use grepl
, with the pattern ^[Mm]
:
a[grepl("^[Mm]", a)]
[1] "Mom" "mother"
Here is what the pattern ^[Mm]
means:
^ from the start of the string
[Mm] match either a lowercase or uppercase letter M
The grepl
function works by just asserting that the input pattern matches at least once, so we don't need to be concerned with the rest of the string.
Upvotes: 1
Reputation: 13319
Using stringr
library(stringr)
unlist(str_extract_all(a,regex("^M.*",ignore_case = T)))
[1] "Mom" "mother"
Upvotes: 2