keenan
keenan

Reputation: 504

R: add a character to a specific spot in string, trouble with regex syntax

I have a list of string like so: batch1, batch2, batch3, batch10, batch11

I am trying to add a 0 before the single digits batch01, batch02, batch03, batch10, batch11

I have found many similar questions and tried to write my own regex. I am very close, but I can't quite make it do what I want.

Batch <- gsub('(.{5})([0-9]{1}\\b)','\\10\\2', Batch) outputs batch01, batch02, batch 03, batch100, batch110

\\s instead of \\b doesn't change any values

sampleNames$Batch <- gsub('(.{5})([0-9]{1})','\\10\\2', sampleNames$Batch) outputs bacth01, batch02, batch03, batch010, batch011

I've played around with a few other versions but I cannot seem to get it correct. I know this is a somewhat repetitive question, but I have not been able to alter previous solutions to do what I need to do.

Upvotes: 2

Views: 102

Answers (3)

Anoushiravan R
Anoushiravan R

Reputation: 21908

You can also use the following solution:

sapply(vec, function(x) {
  d <- gsub("([[:alpha:]]+)(\\d)", "\\2", x)
  if(nchar(d) == 1) {
    gsub("([[:alpha:]]+)(\\d)", "\\10\\2", x)
  } else {
    x
  }
})

   batch1    batch2    batch3   batch10   batch11 
"batch01" "batch02" "batch03" "batch10" "batch11"

Upvotes: 1

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use

sampleNames$Batch <- sub("(\\D|^)(\\d)$", "\\10\\2", sampleNames$Batch, perl=TRUE)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \D                       non-digits (all but 0-9)
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    ^                        the beginning of the string
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Upvotes: 1

akrun
akrun

Reputation: 887028

We can capture the last digit and the lower case letter before it as two groups, then in the replacement specify the backreference of the groups and the 0 in between. Thus, it won't match the ones having two digits at the end of the string

sub("([a-z])(\\d)$", "\\10\\2", Batch)
[1] "batch01" "batch02" "batch03" "batch10" "batch11"

Or we may use sprintf/str_pad with str_replace

library(stringr)
str_replace(Batch, "\\d+$", function(x) sprintf("%02d", as.numeric(x)))
[1] "batch01" "batch02" "batch03" "batch10" "batch11"

data

Batch <- c("batch1", "batch2", "batch3", "batch10", "batch11")

Upvotes: 1

Related Questions