Mark Miller
Mark Miller

Reputation: 13103

split string at first number

I would like to split strings between the last letter and first number:

dat <- read.table(text = "
        x         y    
        a1        0.1
        a2        0.2
        a3        0.3
        a4        0.4
        df1       0.1
        df2       0.2
        df13      0.3
        df24      0.4
        fcs111    0.1
        fcs912    0.2
        fcs113    0.3
        fcsb8114  0.4", 
 header=TRUE, stringsAsFactors=FALSE)

desired.result <- read.table(text = "
        x1    x2     y    
        a     1      0.1
        a     2      0.2
        a     3      0.3
        a     4      0.4
        df    1      0.1
        df    2      0.2
        df    13     0.3
        df    24     0.4
        fcs   111    0.1
        fcs   912    0.2
        fcs   113    0.3
        fcsb  8114   0.4", 
 header=TRUE, stringsAsFactors=FALSE)

There are a number of similar questions on StackOverflow, but I cannot find this exact situation. I know this must be a basic question. If I put a couple of hours into it I could probably figure it out. Sorry. Thank you for any suggestions. I prefer base R. If this is a duplicate I can delete it.

Upvotes: 6

Views: 1481

Answers (4)

Richie Cotton
Richie Cotton

Reputation: 121057

The stringr package makes this slightly more readable. In the following example [[:alpha:]] and [[:digit:]] are locale-independent character classes for letters and numbers respectively.

library(stringr)
matches <- str_match(dat$x, "([[:alpha:]]+)([[:digit:]])")
desired.result <- data.frame(
  x1 = matches[, 2], 
  x2 = as.numeric(matches[, 3]), 
  y  = dat$y
)

Upvotes: 1

Tyler Rinker
Tyler Rinker

Reputation: 109844

A method using gsub and strsplit:

data.frame(do.call(rbind, strsplit(gsub("([a-zA-Z])([0-9])", "\\1_\\2", 
    dat$x), "_")), y = dat$y)

##      X1   X2   y
## 1     a    1 0.1
## 2     a    2 0.2
## 3     a    3 0.3
## 4     a    4 0.4
## 5    df    1 0.1
## 6    df    2 0.2
## 7    df   13 0.3
## 8    df   24 0.4
## 9   fcs  111 0.1
## 10  fcs  912 0.2
## 11  fcs  113 0.3
## 12 fcsb 8114 0.4

Tis shows what's happening at each stage:

(a <- gsub("([a-zA-Z])([0-9])", "\\1_\\2", dat$x))
(b <- strsplit(a, "_"))
(d <- do.call(rbind, b))
data.frame(d, y = dat$y)

Upvotes: 2

CHP
CHP

Reputation: 17189

You can use strsplit function and provide regex pattern for split argument

cbind(dat, do.call(rbind, strsplit(dat$x, split = "(?<=[a-zA-Z])(?=[0-9])", perl = T)))
##           x   y    1    2
## 1        a1 0.1    a    1
## 2        a2 0.2    a    2
## 3        a3 0.3    a    3
## 4        a4 0.4    a    4
## 5       df1 0.1   df    1
## 6       df2 0.2   df    2
## 7      df13 0.3   df   13
## 8      df24 0.4   df   24
## 9    fcs111 0.1  fcs  111
## 10   fcs912 0.2  fcs  912
## 11   fcs113 0.3  fcs  113
## 12 fcsb8114 0.4 fcsb 8114

Upvotes: 4

anubhava
anubhava

Reputation: 784998

You can use lookarounds:

(?<=[a-zA-Z])(?=[0-9])

Upvotes: 5

Related Questions