Sasha
Sasha

Reputation: 6039

Splitting a string with a number in the end in R

How can I split a string which contains a number (of unknown number of digits) into two strings - the number and the rest of the string. Notice that there could be other numbers in the string which should not be affected. For example:

"abc665abc12"   -> "abc665abc", "12"
"abc665abc 182" -> "abc665abc", "182"
"abc665abc0"    -> "abc665abc", "0"

Thanks!

Upvotes: 3

Views: 756

Answers (5)

jeremycg
jeremycg

Reputation: 24945

In base:

cbind(x,
      gsub("[ 0-9]+$", "", x), 
      gsub("^[a-z 0-9]+[a-z ]+", "", x))

     x                                
[1,] "abc665abc12"   "abc665abc" "12" 
[2,] "abc665abc 182" "abc665abc" "182"
[3,] "abc665abc0"    "abc665abc" "0" 

Upvotes: 2

Bg1850
Bg1850

Reputation: 3082

Solution using good old regex :with two of your character vectors

    x <-"abc665abc12"
    y <- "abc665abc 182"
    patterns<-"[[:digit:]]+$"
    m1 <- regexpr(patterns,x) 
    m2 <-regexpr(patterns,y)

now regmatches(x,m1) yield "12" n regmatches(y,m1) yields "182"

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174696

You may also use strsplit

> x = c("abc665abc12", "abc665abc 182", "abc665abc0")
> strsplit(x, "(?<=[A-Za-z])\\s*(?=\\d+$)", perl = TRUE)
[[1]]
[1] "abc665abc" "12"       

[[2]]
[1] "abc665abc" "182"      

[[3]]
[1] "abc665abc" "0"  

Upvotes: 8

hwnd
hwnd

Reputation: 70722

When it comes to things like this, I like using strapply from the gsubfn package:

library(gsubfn)
strapply('abc665abc12', '(.*?) *(\\d+)$', c)[[1]]
# [1] "abc665abc" "12" 

If you have a character vector, it's the same concept:

strapply(x, '(.*?) *(\\d+)$', c)

Upvotes: 3

Frank
Frank

Reputation: 66819

This works:

# op's example
x = c("abc665abc12", "abc665abc 182", "abc665abc0")

library(stringi)
res = stri_match_first_regex(x, "^(.*?) ?([0-9]+)$")


     [,1]            [,2]        [,3] 
[1,] "abc665abc12"   "abc665abc" "12" 
[2,] "abc665abc 182" "abc665abc" "182"
[3,] "abc665abc0"    "abc665abc" "0"  

Your desired parts are in columns 2 & 3, corresponding to the parentheses in the regex.

Upvotes: 6

Related Questions