Ayush Rastogi
Ayush Rastogi

Reputation: 27

Extracting specific elements from a character vector

I have a character vector

a=c("Mom", "mother", "Alex", "Betty", "Prime Minister")

I want to extract words starting with "M" only (upper and lower both)

How to do this?

I have tried using grep(), sub() and other variants of this function but I am not getting it right.

I expect the output to be a character vector of "Mom" and "mother"

Upvotes: 0

Views: 2608

Answers (5)

Wimpel
Wimpel

Reputation: 27792

plain grep will also do just fine

grep( "^m", a, ignore.case = TRUE, value = TRUE )
#[1] "Mom"    "mother"

benchmarks
tom's answer (startsWith) is the winner, but there is some room for improvement (check startsWith2's code)

microbenchmark::microbenchmark(
  substr = a[substr(a, 1, 1) %in% c("M", "m")],
  grepl = a[grepl("^[Mm]", a)],
  grep = grep( "^m", a, ignore.case = TRUE, value = TRUE ),
  stringr = unlist(stringr::str_extract_all(a,regex("^M.*",ignore_case = T))),
  startsWith1 = a[startsWith(toupper(a), "M")],
  startsWith2= a[startsWith(a, c("M", "m"))]
)


# Unit: nanoseconds
#        expr   min      lq     mean median    uq    max neval
#      substr  1808  2411.0  3323.19   3314  3917   8435   100
#       grepl  3916  4218.0  5438.06   4820  6930   8436   100
#        grep  3615  4368.5  5450.10   4820  6929  19582   100
#     stringr 50913 53023.0 55764.10  54529 55132 174432   100
# startsWith1  1506  2109.0  2814.11   2711  3013  17474   100
# startsWith2   602  1205.0  1410.17   1206  1507   3013   100

Upvotes: 2

s_baldur
s_baldur

Reputation: 33743

substr is a very tractable base R function:

a[substr(a, 1, 1) %in% c("M", "m")]

# [1] "Mom"    "mother"

And since you mentioned sub() then you could do (not necessarily recommended though):

a[sub("(.).*", "\\1", a) %in% c("M", "m")]

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522762

Use grepl, with the pattern ^[Mm]:

a[grepl("^[Mm]", a)]

[1] "Mom"    "mother"

Here is what the pattern ^[Mm] means:

^      from the start of the string
[Mm]   match either a lowercase or uppercase letter M

The grepl function works by just asserting that the input pattern matches at least once, so we don't need to be concerned with the rest of the string.

Upvotes: 1

NelsonGon
NelsonGon

Reputation: 13319

Using stringr

 library(stringr)
   unlist(str_extract_all(a,regex("^M.*",ignore_case = T)))



[1] "Mom"    "mother"

Upvotes: 2

tfehring
tfehring

Reputation: 394

a[startsWith(toupper(a), "M")]

Upvotes: 2

Related Questions