Reputation: 39

regex to get everything before first number

I can't figure out how to get this regex to work.

My sample data vector looks like this:

claims40 1.1010101
clinical41 391.1
...

It follows the pattern of:

I'm trying to create a new column in the data frame with just the name, which can be a variable amount of characters.

So the new column should look like:

claims
clinical
...

When I try to use the expression:

^(.*?)\\d

in regexp, I don't get the correct character match length.

Question: What is the correct regex to capture everything in a string prior to the first number?

Upvotes: 3

Answers (3)

Reputation: 18681

Also with str_extract from stringr:

stringr::str_extract(c("claims40 1.1010101", "clinical41 391.1"), "^[[:alpha:]]+")
# [1] "claims"   "clinical"

This "extracts" the alphabetical characters instead of removing everything else.

Upvotes: 0

Reputation: 5893

If you specifically want to match until the first digit, you could also do this

gsub("^(.+?)(?=\\d).*", "\\1", c("claims40 1.1010101", "clinical41 391.1"), perl = TRUE)

[1] "claims"   "clinical"

Upvotes: 0

Reputation: 12713

gsub("[^a-zA-Z]", "", c("claims40 1.1010101", "clinical41 391.1"))
# [1] "claims"   "clinical"

Also this posix style:

gsub("[^[:alpha:]]", "", c("claims40 1.1010101", "clinical41 391.1"))
# [1] "claims"   "clinical"

Upvotes: 2