Reputation: 905
How to remove everything contained after the first number of a string?
x <- c("Hubert 208 apt 1", "Mass Av 300, block 3")
After this question, I succeeded in removing everything before the first number, the first number inclusive:
gsub( "^\\D*\\d+", "", x )
[1] " apt 1" ", block 3"
But the desired output looks like this:
[1] "Hubert 208" "Mass Av 300"
>
Upvotes: 4
Views: 3131
Reputation: 5138
You could also use your current regex pattern with stringr::str_extract
:
x <- c("Hubert 208 apt 1", "Mass Av 300, block 3")
stringr::str_extract(x, "^\\D*\\d+")
[1] "Hubert 208" "Mass Av 300"
Upvotes: 1
Reputation: 163207
Another option instead of replace is to take your expression and use the match instead.
Your pattern will match till after the first digits by matching from the start of the string ^
0+ times not a digit \D*
followed by 1+ times a digit \d+
:
^\\D*\\d+
If you use sub with perl=TRUE you could make use of \K
to forget what was matched.
Then you might use:
^\\D*\\d+\\K.*
In the replacement use an empty string.
sub("^\\D*\\d+\\K.*", "", x, perl=TRUE)
Upvotes: 1
Reputation: 886998
In the OP's current code, a minor change can make it work i.e. to capture the matching pattern as a group ((...)
) and replace with backreference (\\1
)
sub("^(\\D*\\d+).*", "\\1", x)
#[1] "Hubert 208" "Mass Av 300"
Here, the pattern from OP implies ("^\\D*\\d+"
) - zero or more characters that are not a digit (\\D*
) from the start (^
) of the string, followed by one or more digits (\\d+
) and this is captured as a group with parens ((...)
).
Also, instead of gsub
(global substitution) we need only sub
as we need to match only a single instance (from the beginning)
Upvotes: 6