Mel
Mel

Reputation: 750

How to extract all text before first numeric character only in R

I have a variable x that is a series of characters such as:

"W1W", "BT3", "BS5", "E1W", "B68"

From this I need to extract the characters before the first numeric character to get e.g.

"W", "BT", "BS", "E", "B"

I have tried looking through previous questions and found:

gsub("[^a-zA-Z]", "", x)

but this keeps the text characters following the numeric character and results in:

"WW", "BT", "BS", "EW", "B"

Is there any way to get only the leading text characters before the numeric character and drop everything afterwards?

Upvotes: 4

Views: 3245

Answers (4)

akrun
akrun

Reputation: 887118

Using regmatches/regexpr from base R

regmatches(x, regexpr("\\D+(?=\\d)", x, perl = TRUE))
#[1] "W"  "BT" "BS" "E"  "B" 

data

x <- c("W1W", "BT3", "BS5", "E1W", "B68")

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use

sub("^(\\D+).*", "\\1", x)

If there must be a digit and the digits can be at the start (and you need empty values then), use

sub("^(\\D*)\\d.*", "\\1", x)

See the regex demo and regex demo #2

The regex matches

  • ^ - start of string
  • (\D*) - 0+ non-digit symbol
  • \d - a digit
  • .* - any 0+ chars to the end of the string

Upvotes: 1

r.user.05apr
r.user.05apr

Reputation: 5456

x <- c("W1W", "BT3", "BS5", "E1W", "B68")

library(stringr)

str_extract(x, "^\\D+")

# [1] "W"  "BT" "BS" "E"  "B" 

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269634

Using x in the Note at the end, remove everything from the first digit onwards:

sub("\\d.*", "", x)
## [1] "W"  "BT" "BS" "E"  "B" 

Note

x <- c("W1W", "BT3", "BS5", "E1W", "B68")

Upvotes: 0

Related Questions