Reputation: 39
I can't figure out how to get this regex to work.
My sample data vector looks like this:
claims40 1.1010101
clinical41 391.1
...
It follows the pattern of:
I'm trying to create a new column in the data frame with just the name, which can be a variable amount of characters.
So the new column should look like:
claims
clinical
...
When I try to use the expression:
^(.*?)\\d
in regexp, I don't get the correct character match length.
Question: What is the correct regex to capture everything in a string prior to the first number?
Upvotes: 3
Views: 10217
Reputation: 18681
Also with str_extract
from stringr
:
stringr::str_extract(c("claims40 1.1010101", "clinical41 391.1"), "^[[:alpha:]]+")
# [1] "claims" "clinical"
This "extracts" the alphabetical characters instead of removing everything else.
Upvotes: 0
Reputation: 5893
If you specifically want to match until the first digit, you could also do this
gsub("^(.+?)(?=\\d).*", "\\1", c("claims40 1.1010101", "clinical41 391.1"), perl = TRUE)
[1] "claims" "clinical"
Upvotes: 0
Reputation: 12713
gsub("[^a-zA-Z]", "", c("claims40 1.1010101", "clinical41 391.1"))
# [1] "claims" "clinical"
Also this posix style:
gsub("[^[:alpha:]]", "", c("claims40 1.1010101", "clinical41 391.1"))
# [1] "claims" "clinical"
Upvotes: 2