ano
ano

Reputation: 704

Add leading zeros within string

I have a series of column names that I'm trying to standardize.

names <- c("apple", "banana", "orange", "apple1", "apple2", "apple10", "apple11", "banana2", "banana12")

I would like anything that has a one digit number to be padded by a zero, so

apple
banana
orange
apple01
apple02
apple10
apple11
banana02
...

I've been trying to use stringr

strdouble <- str_detect(names, "[0-9]{2}")
strsingle <- str_detect(names, "[0-9]")

str_detect(names[strsingle & !strdouble])

but unable to figure out how to selectively replace/prepend...

Upvotes: 8

Views: 2641

Answers (4)

Frank
Frank

Reputation: 66819

str_pad from stringr:

library(stringr)

pad_if = function(x, cond, n, fill = "0") str_pad(x, n*cond, pad = fill)

s = str_split_fixed(names,"(?=\\d)",2)
#       [,1]     [,2]
#  [1,] "apple"  ""  
#  [2,] "banana" ""  
#  [3,] "orange" ""  
#  [4,] "apple"  "1" 
#  [5,] "apple"  "2" 
#  [6,] "apple"  "10"
#  [7,] "apple"  "11"
#  [8,] "banana" "2" 
#  [9,] "banana" "12"

paste0(s[,1], pad_if(s[,2], cond = nchar(s[,2]) > 0, n = max(nchar(s[,2]))))
# [1] "apple"    "banana"   "orange"   "apple01"  "apple02"  "apple10"  "apple11"  "banana02" "banana12"

This also extends to cases like going from c("a","a2","a20","a202") to c("a","a002","a020","a202"), which the other approaches don't cover.

The stringr package is based on stringi, which has all the same functionality used here, I'm guessing.


sprintf from base, with a similar approach:

pad_if2 = function(x, cond, n, fill = "0") 
  replace(x, cond, sprintf(paste0("%",fill,n,"d"), as.numeric(x)[cond]))

s0 = strsplit(names,"(?<=\\D)(?=\\d)|$",perl=TRUE)

s1 = sapply(s0,`[`,1)
s2 = sapply(sapply(s0,`[`,-1), paste0, "")

paste0(s1, pad_if2(s2, cond = nchar(s2) > 0, n = max(nchar(s2))))

pad_if2 has less general use than pad_if, since it requires x be coercible to numeric. Pretty much every step here is clunkier than the corresponding code with the packages mentioned above.

Upvotes: 2

Shenglin Chen
Shenglin Chen

Reputation: 4554

Key is to identify single digit with $ and letter before digit. It could be tried:

gsub('[^0-9]([0-9])$','0\\1',names)
[1] "apple"    "banana"   "orange"   "appl01"   "appl02"   "apple10"  "apple11"  "banan02"  "banana12"

or look-ahead.

gsub('(?<=[a-z])(\\d)$','0\\1',names,perl=T)

Upvotes: 0

etienne
etienne

Reputation: 3678

You can use sub("([a-z])([0-9])$","\\10\\2",names) :

[1] "apple"    "banana"   "orange"   "apple01"  "apple02"  "apple10"  "apple11"  "banana02"
[9] "banana12"

It only changes the names where there is a single digit following a letter (the $ is the end of the string).

The \\1 selects the first block in () : the letter. Then it puts a leading 0, then the second block in () : the digit.

Upvotes: 8

Matthew Plourde
Matthew Plourde

Reputation: 44614

Here's one option using negative look-ahead and look-behind assertions to identify single digits.

gsub('(?<!\\d)(\\d)(?!\\d)', '0\\1', names, perl=TRUE)
# [1] "apple"    "banana"   "orange"   "apple01"  "apple02"  "apple10"  "apple11"  "banana02" "banana12"

Upvotes: 6

Related Questions