Using str_sub from stringr to replace characters in all elements of a vector

Question

I am trying to replace characters in all elements of a string vector, where the characters are different, but always a certain distance from the beginning or the end of the string. I can use substr successfully to replace the characters from the beginning of the string. I am trying to use str_sub from the package stringr to replace the characters from the end of the sting (because it allows counting backwards with negative numbers). It replaces the character, but for all elements after the first one, it replaces everything to the right of the character with the end of the string from the first element:

> require(stringr)
> x <- c("A'B'C","E!FG@H","I$JKL&M")
> substr(x,2,2) <- ":"
> x
[1] "A:B'C"   "E:FG@H"  "I:JKL&M"
> str_sub(x,-2,-2) <- ":"
> x
[1] "A:B:C"   "E:FG:C"  "I:JKL:C"

Hana · Accepted Answer

Try this:

require(stringr)
x <- c("A'B'C","E!FG@H","I$JKL&M")
substr(x,2,2) <- ":"
str_sub(x, rep(-2, length(x)),  rep(-2, length(x))) <- ":"

The reason it behaves like this is because str_sub<- passes the results of str_sub(x, start, end) through str_c which is where the collapsing is going wrong for you.

The function in the source code is:

"str_sub<-" <- function(string, start = 1L, end = -1L, value) {
     str_c(
        str_sub(string, end = start - 1L),
        value,
        ifelse(end == -1L, "", str_sub(string, start = end + 1L)))
 }

So we are effectively passing three arguments to the str_c function, one or more character vectors, an insertion string, and a collapse parameter (the ifelse bit). If the results of running just str_sub without using the assignment function are (if we already ran the first str_sub:

> (test.string <- str_sub(x, start = 1L, end = -2 - 1L)) #start defaults to 1L
[1] "A:B"   "E:FG"  "I:JKL"
> replace.string <- ":"
> (collapse.string <- ifelse(end == -1L, "", str_sub(string, start = end + 1L)))
[1] "C"
> str_c(test.string, replace.string, collapse.string)
[1] "A:B:C"   "E:FG:C"  "I:JKL:C"

So first we're saving everything to the left of the symbol you want to replace, and then we're setting a collapse parameter. The collapse parameter is kind of interesting, if you look at the docs for str_c, you'll find that it says

If collapse is ... non-‘NULL’ that string is inserted at the end of each row, and the entire matrix collapsed to a single string.

So that's exactly what's happening here, when we replace our strings, it is adding the collapse parameter to the end of each of our strings.

But actually, this would work if the ifelse function wasn't used, because without the ifelse, str_sub(string, start = end + 1L) would return [1] "C" "H" "M" instead of just taking the first index, "C".

So this is why when we add a start and end value of c(-2, -2, -2) instead we can get the right answer:

> (test.string <- str_sub(x, start = 1L, end = c(-2, -2, -2) - 1L)) #start defaults to 1L
[1] "A:B"   "E:FG"  "I:JKL"
> replace.string <- ":"
> (collapse.string <- ifelse(end == -1L, "", str_sub(string, start = c(-2, -2, -2) + 1L)))
[1]  "C" "H" "M"
> str_c(test.string, replace.string, collapse.string)
[1] "A:B:C"   "E:FG:C"  "I:JKL:C"

Using str_sub from stringr to replace characters in all elements of a vector

Answers (2)

Related Questions