Reputation: 704
I have a series of column names that I'm trying to standardize.
names <- c("apple", "banana", "orange", "apple1", "apple2", "apple10", "apple11", "banana2", "banana12")
I would like anything that has a one digit number to be padded by a zero, so
apple
banana
orange
apple01
apple02
apple10
apple11
banana02
...
I've been trying to use stringr
strdouble <- str_detect(names, "[0-9]{2}")
strsingle <- str_detect(names, "[0-9]")
str_detect(names[strsingle & !strdouble])
but unable to figure out how to selectively replace/prepend...
Upvotes: 8
Views: 2641
Reputation: 66819
str_pad from stringr:
library(stringr)
pad_if = function(x, cond, n, fill = "0") str_pad(x, n*cond, pad = fill)
s = str_split_fixed(names,"(?=\\d)",2)
# [,1] [,2]
# [1,] "apple" ""
# [2,] "banana" ""
# [3,] "orange" ""
# [4,] "apple" "1"
# [5,] "apple" "2"
# [6,] "apple" "10"
# [7,] "apple" "11"
# [8,] "banana" "2"
# [9,] "banana" "12"
paste0(s[,1], pad_if(s[,2], cond = nchar(s[,2]) > 0, n = max(nchar(s[,2]))))
# [1] "apple" "banana" "orange" "apple01" "apple02" "apple10" "apple11" "banana02" "banana12"
This also extends to cases like going from c("a","a2","a20","a202")
to c("a","a002","a020","a202")
, which the other approaches don't cover.
The stringr package is based on stringi, which has all the same functionality used here, I'm guessing.
sprintf from base, with a similar approach:
pad_if2 = function(x, cond, n, fill = "0")
replace(x, cond, sprintf(paste0("%",fill,n,"d"), as.numeric(x)[cond]))
s0 = strsplit(names,"(?<=\\D)(?=\\d)|$",perl=TRUE)
s1 = sapply(s0,`[`,1)
s2 = sapply(sapply(s0,`[`,-1), paste0, "")
paste0(s1, pad_if2(s2, cond = nchar(s2) > 0, n = max(nchar(s2))))
pad_if2
has less general use than pad_if
, since it requires x
be coercible to numeric. Pretty much every step here is clunkier than the corresponding code with the packages mentioned above.
Upvotes: 2
Reputation: 4554
Key is to identify single digit with $ and letter before digit. It could be tried:
gsub('[^0-9]([0-9])$','0\\1',names)
[1] "apple" "banana" "orange" "appl01" "appl02" "apple10" "apple11" "banan02" "banana12"
or look-ahead.
gsub('(?<=[a-z])(\\d)$','0\\1',names,perl=T)
Upvotes: 0
Reputation: 3678
You can use sub("([a-z])([0-9])$","\\10\\2",names)
:
[1] "apple" "banana" "orange" "apple01" "apple02" "apple10" "apple11" "banana02"
[9] "banana12"
It only changes the names where there is a single digit following a letter (the $
is the end of the string).
The \\1
selects the first block in ()
: the letter. Then it puts a leading 0, then the second block in ()
: the digit.
Upvotes: 8
Reputation: 44614
Here's one option using negative look-ahead and look-behind assertions to identify single digits.
gsub('(?<!\\d)(\\d)(?!\\d)', '0\\1', names, perl=TRUE)
# [1] "apple" "banana" "orange" "apple01" "apple02" "apple10" "apple11" "banana02" "banana12"
Upvotes: 6