Jeni
Jeni

Reputation: 958

Remove everything after any digit in a pattern

I am trying to use gsub to remove every character after any digit in each of the values of a column of my dataframe:

Tumoral_stage   Methastatic_stage
    T1a                M0
    T1b                M0
    T2c                M0
    T3b                M0
    T1c                M0
    T2                 M0
    T3a                M1

I would like to get this dataframe:

Tumoral_stage   Methastatic_stage
    T1                 M0
    T1                 M0
    T2                 M0
    T3                 M0
    T1                 M0
    T2                 M0
    T3                 M1

I would like to apply a gsub instruction in order to achieve this, but I don't know how to indicate to remove everything after any numeric character.

Upvotes: 1

Views: 489

Answers (3)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Consider capturing the part you want to keep and using backreference \\1:

sub("(.*\\d)\\w", "\\1", x)
[1] "T1" "T1" "T2" "T3" "T1" "T2" "T3"

Upvotes: 0

s_baldur
s_baldur

Reputation: 33498

Using sub() and positive lookbehind:

x <- c("T1a", "T1b", "T2c", "T3b", "T1c", "T2", "T3a")

sub("(?<=[0-9]).+", "", x, perl = TRUE)

# [1] "T1" "T1" "T2" "T3" "T1" "T2" "T3"

Upvotes: 4

akrun
akrun

Reputation: 886968

We can also use substr

substr(x, 1, 2)

Or with str_remove

library(stringr)
str_remove(x, "[^0-9]+$")

data

x <- c("T1a", "T1b", "T2c", "T3b", "T1c", "T2", "T3a")

Upvotes: 1

Related Questions