SriniShine
SriniShine

Reputation: 1139

Replacing the nth number in a string

I have a set of files which I had named incorrectly. The file name is as follows.

Generation_Flux_0_Model_200.txt
Generation_Flux_101_Model_43.txt
Generation_Flux_11_Model_3.txt

I need to replace the second number (the model number) by adding 1 to the existing number. So the correct names would be

Generation_Flux_0_Model_201.txt
Generation_Flux_101_Model_44.txt
Generation_Flux_11_Model_4.txt

This is the code I wrote. I would like to know how to specify the position of the number (replace second number in the string with the new number)?

reNameModelNumber <- function(modelName){

  #get the current model number
  modelNumber = as.numeric(unlist(str_extract_all(modelName, "\\d+"))[2])

  #increment it by 1
  newModelNumber = modelNumber + 1

  #building the new name with gsub 
  newModelName = gsub("  regex ", newModelNumber, modelName) 

  #rename
  file.rename(modelName, newModelName)


}


reactionModels = list.files(pattern = "^Generation_Flux_\\d+_Model_\\d+.txt$")

sapply(reactionFiles, function(x) reNameModelNumber(x))

Upvotes: 6

Views: 353

Answers (4)

lmo
lmo

Reputation: 38500

Assuming that the digit always occurs before the extension, as is mentioned in the comments, here is another base R solution that is a little bit simpler.

sapply(regmatches(tmp, regexec("\\d+(?=\\.)", tmp, perl=TRUE), invert=NA),
       function(x) paste0(c(x[1], as.integer(x[2]) + 1L, x[3]), collapse=""))

This returns

[1] "Generation_Flux_0_Model_201.txt"  "Generation_Flux_101_Model_44.txt"
[3] "Generation_Flux_11_Model_4.txt" 

regexec with the invert=NA a list of indices where each list element is the index matching the portions of the full with the matched element returned as the second indexed element. regmatches takes this information and returns a list of character vectors that breaks up the original string along the matches. Feed this list to sapply, convert the second element to integer and increment. Then paste the result to return an atomic vector.

The regex "\d+(?=\.)" uses a perl look behind, "(?=\.)", looking for the dot without capturing it, but capturing the digits with "\d+".

data

tmp <- c("Generation_Flux_0_Model_200.txt", "Generation_Flux_101_Model_43.txt", 
"Generation_Flux_11_Model_3.txt")

Upvotes: 2

s_baldur
s_baldur

Reputation: 33488

Using base-R.

data <- c( # Just an example
  "Generation_Flux_0_Model_200.txt",
  "Generation_Flux_101_Model_43.txt",
  "Generation_Flux_11_Model_3.txt"
)

fixNameModel <- function(data){
  n <- length(data)

  # get the current model number and increment it by 1
  newn = as.integer(sub(".+_(\\d+)\\.txt", "\\1", data)) + 1L

  #building the new name with gsub
  newModelName <- vector(mode = "character", length = n)
  for (i in 1:n) {
    newModelName[i] <- gsub("\\d+\\.txt$", paste0(newn[i], ".txt"), data[i])
  }
  newModelName
}

fixNameModel(data)
[1] "Generation_Flux_0_Model_201.txt"  "Generation_Flux_101_Model_44.txt"
[3] "Generation_Flux_11_Model_4.txt"  

You can now do something like file.rename(modelName, fixNameModel(modelName))

EDIT:

Here is a bit neater version but makes stronger assumptions instead:

fixNameModel2 <- function(data) {
  sapply(
    strsplit(data, "_|\\."), 
    function(x) {
      x[5] <- as.integer(x[5]) + 1L
      x <- paste0(x, collapse = "_")
      gsub("_txt", ".txt", x, fixed = TRUE)
    } 
  )
}

Upvotes: 4

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

Answering the question, if you want to increment a certain number inside a string, you may use

> library(gsubfn)
> nth = 2
> reactionFiles <- c("Generation_Flux_0_Model_200.txt", "Generation_Flux_101_Model_43.txt", "Generation_Flux_11_Model_3.txt")
> gsubfn(paste0("^((?:\\D*\\d+){", nth-1, "}\\D*)(\\d+)"), function(x,y,z) paste0(x, as.numeric(y) + 1), reactionFiles)
[1] "Generation_Flux_0_Model_201.txt"  "Generation_Flux_101_Model_44.txt" "Generation_Flux_11_Model_4.txt"  

nth here is the number of the digit chunk to increment.

Pattern details

  • ^((?:\\D*\\d+){n}\\D*) - Capturing group 1 (the value is accessed in the gsubfn method via x):
    • (?:\\D*\\d+){n} - an n occurrences of
      • \\D* - 0 or more chars other than digits
      • \\d+ - 1+ digits
    • \\D* - 0+ non-digits
  • (\\d+) - Capturing group 2 (the value is accessed in the gsubfn method via y): one or more digits

Upvotes: 6

akrun
akrun

Reputation: 887008

We can use gsubfn to incremement by 1. Capture the digits ((\\d+)) followed by a . and 'txt' at the end ($`) of the string, and replace it by adding 1 to it

library(gsubfn)
gsubfn("(\\d+)\\.txt$", ~ as.numeric(x) + 1, str1)
#[1] "Generation_Flux_0_Model_201"  "Generation_Flux_101_Model_44"
#[3] "Generation_Flux_11_Model_4"  

data

str1 <- c("Generation_Flux_0_Model_200.txt", "Generation_Flux_101_Model_43.txt", 
                   "Generation_Flux_11_Model_3.txt")

Upvotes: 8

Related Questions