Reputation: 415
So I have a column of contract names df$name like below
FB210618C00280000
ADM210618C00280000
M210618P00280000
I would like to extract the FB, ADM and M. That is I want to extract characters in the string and they are of different length and stop once the first number occurs, and I don't want to extract the C or P.
The below code will give me the C or P
stri_extract_all_regex(df$name, "[a-z]+")
Upvotes: 0
Views: 623
Reputation: 627469
You can use
library(stringr)
str_extract(df$name, "^[A-Za-z]+")
# Or
str_extract(df$name, "^\\p{L}+")
The stringr::str_extract
function will extract the first occurrence of a pattern and ^[A-Za-z]+
/ ^\p{L}+
regex matches one or more letters at the start of the string. Note \p{L}
matches any Unicode letters.
See the regex demo.
Same pattern can be used with stringi::stri_extract_first()
:
library(stringi)
stri_extract_first(df$name, regex="^[A-Za-z]+")
Upvotes: 2
Reputation: 887891
We can use stri_extract_first
from stringi
library(stringi)
stri_extract_first(df$name, regex = "[A-Z]+")
#[1] "FB" "ADM" "M"
Or we can use base R
with sub
sub("\\d+.*", "", df$name)
#[1] "FB" "ADM" "M"
Or use trimws
from base R
trimws(df$name, whitespace = "\\d+.*")
df <- data.frame(name = c("FB210618C00280000", "ADM210618C00280000",
"M210618P00280000"))
Upvotes: 3