NBK
NBK

Reputation: 905

Remove string after first number using r regex

How to remove everything contained after the first number of a string?

x <- c("Hubert 208 apt 1", "Mass Av 300, block 3")

After this question, I succeeded in removing everything before the first number, the first number inclusive:

gsub( "^\\D*\\d+", "", x )
[1] " apt 1"    ", block 3"

But the desired output looks like this:

[1] "Hubert 208"     "Mass Av 300"
> 

Upvotes: 4

Views: 3131

Answers (4)

Andrew
Andrew

Reputation: 5138

You could also use your current regex pattern with stringr::str_extract:

x <- c("Hubert 208 apt 1", "Mass Av 300, block 3")
stringr::str_extract(x, "^\\D*\\d+")

[1] "Hubert 208"  "Mass Av 300"

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163207

Another option instead of replace is to take your expression and use the match instead.

Your pattern will match till after the first digits by matching from the start of the string ^ 0+ times not a digit \D* followed by 1+ times a digit \d+:

^\\D*\\d+

Regex demo

If you use sub with perl=TRUE you could make use of \K to forget what was matched.

Then you might use:

^\\D*\\d+\\K.*

Regex demo

In the replacement use an empty string.

sub("^\\D*\\d+\\K.*", "", x, perl=TRUE)

Upvotes: 1

Emma
Emma

Reputation: 27723

This expression might be slightly safer,

^\s*(.+?)([0-9]+)

Demo

Upvotes: 1

akrun
akrun

Reputation: 886998

In the OP's current code, a minor change can make it work i.e. to capture the matching pattern as a group ((...)) and replace with backreference (\\1)

sub("^(\\D*\\d+).*", "\\1", x)
#[1] "Hubert 208"  "Mass Av 300"

Here, the pattern from OP implies ("^\\D*\\d+") - zero or more characters that are not a digit (\\D*) from the start (^) of the string, followed by one or more digits (\\d+) and this is captured as a group with parens ((...)).

Also, instead of gsub (global substitution) we need only sub as we need to match only a single instance (from the beginning)

Upvotes: 6

Related Questions