Reputation: 750
I have a variable x that is a series of characters such as:
"W1W", "BT3", "BS5", "E1W", "B68"
From this I need to extract the characters before the first numeric character to get e.g.
"W", "BT", "BS", "E", "B"
I have tried looking through previous questions and found:
gsub("[^a-zA-Z]", "", x)
but this keeps the text characters following the numeric character and results in:
"WW", "BT", "BS", "EW", "B"
Is there any way to get only the leading text characters before the numeric character and drop everything afterwards?
Upvotes: 4
Views: 3245
Reputation: 887118
Using regmatches/regexpr
from base R
regmatches(x, regexpr("\\D+(?=\\d)", x, perl = TRUE))
#[1] "W" "BT" "BS" "E" "B"
x <- c("W1W", "BT3", "BS5", "E1W", "B68")
Upvotes: 1
Reputation: 626845
You may use
sub("^(\\D+).*", "\\1", x)
If there must be a digit and the digits can be at the start (and you need empty values then), use
sub("^(\\D*)\\d.*", "\\1", x)
See the regex demo and regex demo #2
The regex matches
^
- start of string(\D*)
- 0+ non-digit symbol\d
- a digit.*
- any 0+ chars to the end of the stringUpvotes: 1
Reputation: 5456
x <- c("W1W", "BT3", "BS5", "E1W", "B68")
library(stringr)
str_extract(x, "^\\D+")
# [1] "W" "BT" "BS" "E" "B"
Upvotes: 1
Reputation: 269634
Using x
in the Note at the end, remove everything from the first digit onwards:
sub("\\d.*", "", x)
## [1] "W" "BT" "BS" "E" "B"
x <- c("W1W", "BT3", "BS5", "E1W", "B68")
Upvotes: 0