Reputation: 347

Splitting character column by varying position in R

I am having trouble spliting a simple character column into 3 columns depending on the content of the column. A very easy example:

data <- data.frame(x = c("GUIC01", "GUI02"))

> data
       x
1 GUIC01
2  GUI02

I want to create the columns, to produce this:

> desired
       x Parc TipusBassa Num
1 GUIC01  GUI          C  01
2  GUI02  GUI       <NA>  02

Basically if the cell has a c in the middle, it must "create" a column where it says so and split the rest of the content of the cell. So far I tried this approach:

data<-if_else(nchar(data$x) == 5, 
                separate(data, into = c('Parc','Num'), sep = c(3)), 
                separate(data, into = c('Parc', 'TipusBassa','Num'), sep = c(3,4)))

What I am missing? Thanks a lot!

Upvotes: 1

Answers (2)

Ronak Shah

Reputation: 389325

You can use tidyr::extract and pass the regex to extract values in different columns.

tidyr::extract(data, x, c('Parc', 'TipusBassa', 'Num'), 
               '([A-Z]{3})([A-Z]?)([0-9]{2})', remove = FALSE)

#       x Parc TipusBassa Num
#1 GUIC01  GUI          C  01
#2  GUI02  GUI             02

Upvotes: 1

Tim Biegeleisen

Reputation: 522752

We can use the base string functions here:

data$TipusBass <- ifelse(sub("^.*(.).{2}$", "\\1", data$x) == "C", "C", NA)
data$Num <- sub("^.*(..)$", "\\1", data$x)
data

       x TipusBass Num
1 GUIC01         C  01
2  GUI02      <NA>  02

Data:

data <- data.frame(x = c("GUIC01", "GUI02"))

Upvotes: 1

Splitting character column by varying position in R

Answers (2)

Related Questions