Reputation: 3432
Hello I hve a df such as
COL1
BLOC1.1_3_10-355(+)Sp_3
BLOC2.1_10-355(-)SSp_4
BLOC3.1_10-355(+)SP_32
BLOC1_3_10-355(+)SP4_2
How can I find a regex that can replace the _
here > _[Number]-[Number](
by
:[Number]-[Number](
Here I should get
COL1
BLOC1.1_3:10-355(+)Sp_3
BLOC2.1:10-355(-)SSp_4
BLOC3.1:10-355(+)SP_32
BLOC1_3:10-355(+)SP4_2
I tried : gsub("_[0-9]-[0-9](",":[0-9]-[0-9](",df$COL1)
Upvotes: 1
Views: 221
Reputation: 163207
You can use
_([0-9]+-[0-9]+\()
And replace with :
and capture group 1.
COL1 <- c("BLOC1.1_3_10-355(+)Sp_3", "BLOC2.1_10-355(-)SSp_4", "BLOC3.1_10-355(+)SP_32", "BLOC1_3_10-355(+)SP4_2")
gsub("_([0-9]+-[0-9]+\\()", ":\\1", COL1)
Output
[1] "BLOC1.1_3:10-355(+)Sp_3" "BLOC2.1:10-355(-)SSp_4"
[3] "BLOC3.1:10-355(+)SP_32" "BLOC1_3:10-355(+)SP4_2"
Upvotes: 4
Reputation: 520898
A solution using string splitting:
output <- sapply(COL1, function(x) {
parts <- strsplit(x, "_(?=\\d+-)", perl=TRUE)
paste(parts[[1]][1], parts[[1]][2], sep=":")
})
names(output) <- c(1:4)
output
1 2 3
"BLOC1.1_3:10-355(+)Sp_3" "BLOC2.1:10-355(-)SSp_4" "BLOC3.1:10-355(+)SP_32"
4
"BLOC1_3:10-355(+)SP4_2"
Data:
COL1 <- c("BLOC1.1_3_10-355(+)Sp_3", "BLOC2.1_10-355(-)SSp_4",
"BLOC3.1_10-355(+)SP_32", "BLOC1_3_10-355(+)SP4_2")
Upvotes: 3
Reputation: 27732
COL1 <- c("BLOC1.1_3_10-355(+)Sp_3",
"BLOC2.1_10-355(-)SSp_4",
"BLOC3.1_10-355(+)SP_32",
"BLOC1_3_10-355(+)SP4_2")
gsub( "(.*[0-9]+)(_)([0-9]+-.*)", "\\1:\\3", COL1)
[1] "BLOC1.1_3:10-355(+)Sp_3" "BLOC2.1:10-355(-)SSp_4" "BLOC3.1:10-355(+)SP_32"
[4] "BLOC1_3:10-355(+)SP4_2"
Upvotes: 4