Reputation: 958
I am trying to use gsub
to remove every character after any digit in each of the values of a column of my dataframe:
Tumoral_stage Methastatic_stage
T1a M0
T1b M0
T2c M0
T3b M0
T1c M0
T2 M0
T3a M1
I would like to get this dataframe:
Tumoral_stage Methastatic_stage
T1 M0
T1 M0
T2 M0
T3 M0
T1 M0
T2 M0
T3 M1
I would like to apply a gsub
instruction in order to achieve this, but I don't know how to indicate to remove everything after any numeric character.
Upvotes: 1
Views: 489
Reputation: 21400
Consider capturing the part you want to keep and using backreference \\1
:
sub("(.*\\d)\\w", "\\1", x)
[1] "T1" "T1" "T2" "T3" "T1" "T2" "T3"
Upvotes: 0
Reputation: 33498
Using sub()
and positive lookbehind:
x <- c("T1a", "T1b", "T2c", "T3b", "T1c", "T2", "T3a")
sub("(?<=[0-9]).+", "", x, perl = TRUE)
# [1] "T1" "T1" "T2" "T3" "T1" "T2" "T3"
Upvotes: 4
Reputation: 886968
We can also use substr
substr(x, 1, 2)
Or with str_remove
library(stringr)
str_remove(x, "[^0-9]+$")
x <- c("T1a", "T1b", "T2c", "T3b", "T1c", "T2", "T3a")
Upvotes: 1