Reputation: 53
I want to reformat some genome changes so I can use a certain tool. How can I move the first two characters of a string after a colon in the same string?
For example:
g.chr17:7577121G>A must become chr17:g.7577121G>A
g.chr3:52712586T>C must become chr3:g.52712586T>C
There is probably a very straightforward way to do this with gsub an paste, but I can't figure it out.
Upvotes: 3
Views: 736
Reputation: 887481
Here is one without a regex
v1 <- strsplit(input, "[.:]")[[1]]
paste0(v1[2], ":", v1[1], ".", v1[3])
#[1] "chr17:g.7577121G>A"
input <- "g.chr17:7577121G>A"
Upvotes: 2
Reputation: 389105
We can use sub
with 3 capture groups
sub("(^.{2})(.*:)(.*)", "\\2\\1\\3", x)
#[1] "chr17:g.7577121G>A" "chr3:g.52712586T>C"
^.{2}
- First capture group are first two characters.
.*:
- Second capture group is the string till colon.
.*
- Third capture group is the remaining string.
and now we arrange these groups in the order 2-1-3.
data
x <- c("g.chr17:7577121G>A", "g.chr3:52712586T>C")
Upvotes: 3
Reputation: 521914
Try this option:
input <- "g.chr17:7577121G>A"
input <- sub("^([^.]+\\.)([^:]+:)", "\\2\\1", input)
input
[1] "chr17:g.7577121G>A"
The pattern might require some explanation:
^ from the beginning of the input
([^.]+\\.) match and capture any non dot characters up to and including
the first dot
([^:]+:) then match and capture any non colon characters up to and
including the first colon
Then, we replace with these two captured groups reversed. In this case, the first group is g.
, and the second group is chr17:
. So, the replacement string would then start with chr17:g.
, followed by whatever was already there.
Upvotes: 3