Jantje Houten
Jantje Houten

Reputation: 151

A shorter way to extract last set of digits starting from the back

I would like to extract the last set of digits from a string without doing this.

"sdkjfn45sdjk54()ad"

str_remove("sdkjfn45sdjk54()ad","[:alpha:]+$")
[1] "sdkjfn45sdjk54()"

str_remove(str_remove("sdkjfn45sdjk54()ad","[:alpha:]+$"), "\\(")
[1] "sdkjfn45sdjk54)"

str_remove(str_remove(str_remove("sdkjfn45sdjk54()ad","[:alpha:]+$"), "\\("), "\\)")
[1] "sdkjfn45sdjk54"

str_extract(str_remove(str_remove(str_remove("sdkjfn45sdjk54()ad","[:alpha:]+$"), "\\("), "\\)"), "\\d+$")
[1] "54"

because the patterns are uncertain. I am aware that stringi has a str_extract_from_last function but I need to stick to base R or stringR.

Thanks!

Upvotes: 0

Views: 190

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

You can use negative lookahead regex.

string <- "sdkjfn45sdjk54()ad"
stringr::str_extract(string, '(\\d+)(?!.*\\d)')
#[1] "54"

Using the same regex in base R :

regmatches(string, gregexpr('(\\d+)(?!.*\\d)', string, perl = TRUE))[[1]]

This extracts the set of numbers which is not followed by any number so last set of numbers.

Upvotes: 2

r2evans
r2evans

Reputation: 160447

Use str_extract_all and grab just the last one in each vector.

library(stringr)
quux <- str_extract_all(c("a", "sdkjfn45sdjk54()ad"), "[0-9]+")
sapply(quux, `[`, lengths(quux))
# [1] NA   "54"

I use sapply because I'm guessing that you have more than one string. str_extract_all will return a list, where each element is zero or more strings extracted from the source. Since we're only interested in one of those, we can use sapply.

One might be tempted to use sapply(., tail, 1), but if zero are found, then it will be character(0), not empty or NA. I'm inferring that NA would be a good return when the pattern is not found.

Upvotes: 1

Related Questions