Reputation: 170
I have a dataset, In which one column has a values in the format of [A-Z][A-Z][0-1][0-9][0-1][0-1][0-1][0-9][0-9]
ie, AC1200019
Now I want to convert this format to [A-Z][A-Z][-][0-1][0-9][-][0-1][0-1][0-1][-][0-9][0-9]
ie, AC-12-000-19
Upvotes: 0
Views: 226
Reputation: 596
Assuming the entire column has the same number of characters, here a simple version.
library(stringr)
x <- data.frame(X1 = c("AC1510018", "AC1200019", "BT1801007"))
paste(str_sub(x$X1,1,2), str_sub(x$X1,3,4),
str_sub(x$X1,5,7), str_sub(x$X1,8,9) , sep= "-")
I like the dplyr suite so here a version using dplyr and tidyr:
library(dplyr)
library(tidyr)
x %>%
separate(X1, into = c("X2", "X3", "X4", "X5"), sep = c(2,4,7)) %>%
unite("X1", X2, X3, X4, X5, sep="-")
or
x %>%
transmute(X2 = paste(str_sub(X1,1,2), str_sub(X1,3,4),
str_sub(X1,5,7), str_sub(X1,8,9) , sep= "-"))
Upvotes: 0
Reputation: 887571
Try
gsub('^([A-Z]{2})([0-1][0-9])([0-1]{3})([0-9]{2})', '\\1-\\2-\\3-\\4', str1)
#[1] "AC-12-000-19"
str1 <- 'AC1200019'
Upvotes: 1
Reputation: 67988
([A-Z][A-Z])([0-1][0-9])([0-1][0-1][0-1])([0-9][0-9])
Try this.Replace by $1-$2-$3-$4
or \\1-\\2-\\3-\\4
.See demo.
https://regex101.com/r/uK9cD8/5
Upvotes: 1