Reputation: 43199
I have a number of column names that can be represented by the following pattern.
dat <- c("Male97","Male98","Male99", "Male100andover","Female0","Female1" ,"Female2", "Female3", "Female4" ,"Female5", "Female100andover")
I am trying add a preceding delimiting character e.g. a dash, between a letter and numeric characters using a regex.
My desired output is, for example, Male-97
, or Female-0
. However, I do not want the delimiting character inserted after the numeric characters in cases of '100 and over'.
I have tried the following regex:
gsub('([e])[0-9]', '-', dat)
It nearly works. I need something that does not substitute the 'e' with a dash.
Can someone help me along with this please.
Upvotes: 2
Views: 367
Reputation: 626738
Your ([e])[0-9]
regex matches an captures e
followed by a digit, even if the digit is not at the end of the string. Then, you only use -
in the replacement, and thus the digit is lost. You could try to use another capturing group with ([0-9])
, but it would change the value in Male100andover
and suchlike.
You can use a capturing group powered regex like this:
dat <- c("Male97","Male98","Male99", "Male100andover","Female0","Female1" ,"Female2", "Female3", "Female4" ,"Female5", "Female100andover")
gsub("(\\d+)$", "-\\1", dat)
See IDEONE demo.
Explanation:
(\\d+)
- matches and captures into Group 1 one or more digits that are...$
- at the end of the string.In the replacement pattern, \1
backreferences the captured digits.
Result:
[1] "Male-97" "Male-98" "Male-99" "Male100andover"
[5] "Female-0" "Female-1" "Female-2" "Female-3"
[9] "Female-4" "Female-5" "Female100andover"
EDGE CASE HANDLING:
gsub("(\\d+\\D*)$", "-\\1", dat) ## insert before the last digit sequence
## [1] "Male-97" "Male-98over" "Male99over-100under"
gsub("^(\\D*)(\\d+)", "\\1-\\2", dat) ## insert before the first digit sequence
## [1] "Male-97" "Male-98over" "Male-99over100under"
See another demo
Upvotes: 4