ML33M
ML33M

Reputation: 415

R how to match and extract character letters of different length in a string

So I have a column of contract names df$name like below

FB210618C00280000
ADM210618C00280000 M210618P00280000

I would like to extract the FB, ADM and M. That is I want to extract characters in the string and they are of different length and stop once the first number occurs, and I don't want to extract the C or P.

The below code will give me the C or P

stri_extract_all_regex(df$name, "[a-z]+") 

Upvotes: 0

Views: 623

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

You can use

library(stringr)
str_extract(df$name, "^[A-Za-z]+")
# Or
str_extract(df$name, "^\\p{L}+")

The stringr::str_extract function will extract the first occurrence of a pattern and ^[A-Za-z]+ / ^\p{L}+ regex matches one or more letters at the start of the string. Note \p{L} matches any Unicode letters.

See the regex demo.

Same pattern can be used with stringi::stri_extract_first():

library(stringi)
stri_extract_first(df$name, regex="^[A-Za-z]+")

Upvotes: 2

akrun
akrun

Reputation: 887891

We can use stri_extract_first from stringi

library(stringi)
stri_extract_first(df$name, regex = "[A-Z]+")
#[1] "FB"  "ADM" "M" 

Or we can use base R with sub

sub("\\d+.*", "", df$name)
#[1] "FB"  "ADM" "M" 

Or use trimws from base R

trimws(df$name, whitespace = "\\d+.*")

data

df <- data.frame(name = c("FB210618C00280000", "ADM210618C00280000", 
    "M210618P00280000"))

Upvotes: 3

Related Questions