rleenay
rleenay

Reputation: 21

Replacing nth instance of a character string using sub/gsub in R

I am attempting to re-name some character strings given to me in a large list. The issue is that I only need to replace some of the characters not all of them.

exdata <- c("i_am_having_trouble_with_this_string",
            "i_am_wishing_files_were_cleaner_for_me",
            "any_help_would_be_greatly_appreciated")

From this list, for example, I would like to replace the third through the fifth instance of "_" with "-". I am having trouble understanding the regex coding for this, as most examples split strings up instead of keeping them intact.

Upvotes: 1

Views: 3494

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269481

Here are some alternative approaches. All of them can be generalized to arbitrary bounds by replacing 3 and 5 with other numbers.

1) strsplit Split the strings at underscore and use paste to collapse it back using the appropriate separators. No packages are used.

i <- 3
j <- 5
sapply(strsplit(exdata, "_"), function(x) {
  g <- seq_along(x)
  g[g < i] <- i
  g[g > j + 1] <- j+1
  paste(tapply(x, g, paste, collapse = "_"), collapse = "-")
})

giving:

[1] "i_am_having-trouble-with-this_string"  
[2] "i_am_wishing-files-were-cleaner_for_me"
[3] "any_help_would-be-greatly-appreciated" 

2) for loop This translates the first j occurrences of old to new in x and then translates the first i-1 occurrences of new back to old. No packages are used.

translate <- function(old, new, x, i = 1, j) {
 if (i <= 1) {
    if (j > 0) for(k in seq_len(j)) x <- sub(old, new, x, fixed = TRUE)
    x
 } else Recall(new, old, Recall(old, new, x, 1, j), 1, i-1)
}

translate("_", "-", exdata, 3, 5)

giving:

[1] "i_am_having-trouble-with-this_string"  
[2] "i_am_wishing-files-were-cleaner_for_me"
[3] "any_help_would-be-greatly-appreciated" 

3) gsubfn This uses a package but in return is substantially shorter than the others. gsubfn is like gsub except that the replacement string in gsub can be a string, list, function or proto object. In the case of a proto object the fun method of the proto object is invoked each time there is a match to the regular expression. Below the matching string is passed to fun as x while the output of fun replaces the match in the data. The proto object is automatically populated with a number of variables set by gsubfn and accessible by fun including count which is 1 for the first match, 2 for the second and so on. For more information see the gsubfn vignette -- section 4 discusses the use of proto objects.

library(gsubfn)

p <- proto(i = 3, j = 5, 
      fun = function(this, x) if (count >= i && count <= j) "-" else x)
gsubfn("_", p, exdata)

giving:

[1] "i_am_having-trouble-with-this_string"  
[2] "i_am_wishing-files-were-cleaner_for_me"
[3] "any_help_would-be-greatly-appreciated" 

Upvotes: 4

BigTimeStats
BigTimeStats

Reputation: 447

> gsub('(.*_.*_.*?)_(.*?)_(.*?)_(.*)','\\1-\\2-\\3-\\4', exdata)
[1] "i_am_having-trouble-with-this_string"   "i_am_wishing-files-were-cleaner_for_me" "any_help_would-be-greatly-appreciated"

Upvotes: 2

Related Questions