Reputation: 405
I have a factor that has many levels and I want to instert a colon between the digits in all my data and I do not how to use gsub function for this. Example the data I have look like this:
ABCD*0801
ABCD*0701
ABCD*0902
ABCD*0311
ABCD*2001
and what I want is this :
ABCD*08:01
ABCD*07:01
ABCD*09:02
ABCD*03:11
ABCD*20:01
I used this code below but I dont understand it
gsub("(.{4})(.*)$", "\\1:\\2",hladata$DRB1_1)
can you please help me ?
Upvotes: 1
Views: 5819
Reputation: 18541
This should work:
x <- as.factor(c("ABCD*0801",
"ABCD*0701",
"ABCD*0902",
"ABCD*0311",
"ABCD*2001"))
as.factor(gsub("(\\d{2}$)",":\\1", x))
#> [1] ABCD*08:01 ABCD*07:01 ABCD*09:02 ABCD*03:11 ABCD*20:01
#> Levels: ABCD*03:11 ABCD*07:01 ABCD*08:01 ABCD*09:02 ABCD*20:01
Created on 2021-07-30 by the reprex package (v0.3.0)
As @Roland points out in the comments, it’s more efficient to use sub
on the factor levels()
.
x <- as.factor(c("ABCD*0801",
"ABCD*0701",
"ABCD*0902",
"ABCD*0311",
"ABCD*2001"))
levels(x) <- sub("(\\d{2}$)",":\\1", levels(x))
x
#> [1] ABCD*08:01 ABCD*07:01 ABCD*09:02 ABCD*03:11 ABCD*20:01
#> Levels: ABCD*03:11 ABCD*07:01 ABCD*08:01 ABCD*09:02 ABCD*20:01
Upvotes: 1
Reputation: 21400
Here are two options with sub
(since you have just one match per string, gsub
, which is for multiple matches per string, is not necessary):
sub("\\d{2}", "\\1:", x)
This works via backreference: the pattern matched in the first argument (the occurrence of two d
igits) is remembered and repeated in the replacement argument and a :
is added to it.
sub("(?<=\\d{2})(?=\\d{2})", ":", x, perl = TRUE)
This, more complex, solution works with lookaround: the lookbehind (?<=\\d{2})
looks for two digits on the left of the match while (?=\\d{2})
looks for two digits on the right. Where the two lookarounds match, a :
is inserted.
The code you used does not work because of the quantifying expression; you need to change it to {7}
as there are seven characters before the point where you want to insert :
. The way it works is similar to the second option above, namely via backreference: \\1
remembers and repeats the first seven characters captured in the first capturing group (...)
while the second backreference \\2
remembers and repeats the second capturing group; between them :
is added.
gsub("(.{7})(.*)$", "\\1:\\2", x)
Data:
x <- as.factor(c("ABCD*0801",
"ABCD*0701",
"ABCD*0902",
"ABCD*0311",
"ABCD*2001"))
Upvotes: 3