Drew
Drew

Reputation: 593

How to extract the last digits of strings using regular expressions?

I have a bunch of colnames

L_1_3
L_2_23
L_3_91
L_3_16

I want to replace these colnames with new names using the last digits following the _ like this:

3
23
91
16

I've tried colnames(X) <- gsub("L_\\d\\d_", "", colnames(X)) which works for strings with double digits at the end. I want one that works for both single and double digits.

Thank you!

Upvotes: 14

Views: 1411

Answers (6)

hello_friend
hello_friend

Reputation: 5788

I think this might be the simplest regex:

sub(".*\\_", "", tmp)

Upvotes: 0

Ian Campbell
Ian Campbell

Reputation: 24790

Here's an option with positive lookahead:

gsub(".+_(?=\\d+$)", "", X, perl = TRUE)
[1] "3"  "23" "91" "16"

Upvotes: 12

The fourth bird
The fourth bird

Reputation: 163362

If that is the pattern that works for you for 2 digits, the only thing you would have to do is to make one of the digits optional using ?

L_\\d\\d?_

Regex demo | R demo


If you must match the whole pattern, you could use a capturing group and use anchors to assert the start ^ and the end $ of the string and use the group in the replacement.

^L_\\d\\d?_(\\d+)$

In parts

^      Start of string
L_     Match L_
\d     Match a digit
\d?    Match a digit and repeat 0 or 1 times
_      Match _
(      Capture group 1
  \d+  Match a digit and repeat 1 or more times
)      Close group
$      End of string

Regex demo | R demo

X <- c("L_1_3", "L_2_23", "L_3_91", "L_3_16")
gsub("^L_\\d\\d?_(\\d+)$", "\\1", X)

Output

[1] "3"  "23" "91" "16"

Upvotes: 5

akrun
akrun

Reputation: 887158

We can use str_extract

library(stringr)
str_extract(X, "\\d+$")
#[1] "3"  "23" "91" "16"

data

X <- c("L_1_3", "L_2_23", "L_3_91", "L_3_16")

Upvotes: 5

Daniel O
Daniel O

Reputation: 4358

Tried to keep it as simple as possible

sub(".*_(\\d+$)", "\\1", X)
[1] "3"  "23" "91" "16"

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76432

Here is a regular expression that does it.
It gets anything until a non-digit followed by the capture group of one or more digits at the end. And replaces by the the capture group.

sub('.*[^[:digit:]]{1}([[:digit:]]+$)', '\\1', x)
#[1] "3"  "23" "91" "16"

A regex that works for single and double digits but neither more nor less would be

sub('.*[^[:digit:]]{1}([[:digit:]]{1,2}$)', '\\1', x)
#[1] "3"  "23" "91" "16"

Data

x <- scan(what = character(), text = '
L_1_3
L_2_23
L_3_91
L_3_16')

Upvotes: 4

Related Questions