Insertion of characters in strings in R

Question

I would like to insert "&" between letters (upper-case and lower-case), but not before or after letters, and replace each lower-case letter x by tt$X==0, each upper-case letter X by tt$X==1, and each + by )|(, plus an opening bracket and a closing bracket around the entire string, so as to get an expression that can be evaluated in R. For example, I have the string

st <- "AbC + de + FGHIJ"

The result should then look like this:

"(tt$A==1 & tt$B==0 & tt$C==1) | (tt$D==0 & tt$E==0) | (tt$F==1 & tt$G==1 & tt$H==1 & tt$I==1 & tt$J==1)"

Could I easily do this with the gsub() function?

Spacedman · Accepted Answer

A bunch of regexps are rarely elegant, and often hard to debug. The above regexp solution fails if there's not that exact spacing between elements.

> tt("aBc+b")
[1] "(tt$A==0 & tt$B==1 & tt$C==0+tt$B==0)"
> tt("aBc + b")
[1] "(tt$A==0 & tt$B==1 & tt$C==0) | (tt$B==0)"

Sometimes you just have to split the bits up yourself and process them. Here's a solution:

doChar = Vectorize(
    function(c){
        sprintf("tt$%s==%s",toupper(c),ifelse(c %in% LETTERS,"1","0"))
    }
)

doWord = Vectorize(function(W){
    cs = strsplit(W,"")[[1]]
    paste0("(",
           paste(doChar(cs),collapse=" & "),
           ")")
})

processString = function(st){
    parts = strsplit(st,"\+")[[1]]
    parts = gsub(" ","",parts)
    paste0(doWord(parts),collapse=" | ")
}

There's probably many ways to make it better, but it has the benefit of being a bit easier to debug (you can test the parts) and looks less like line noise :)

For the sample string given it returns the same as the tt function which is my function wrapper of the regexp solution:

> tt(st)==processString(st)
[1] TRUE

But handles spacing:

> processString("aBc + deF") == processString("aBc+deF")
[1] TRUE

Its always a good idea to write code that is a bit flexible in the inputs it accepts. You might also notice that the tt part of the output elements appears only once, so if you want to output foo$A instead of tt$A there's only one change needed. The regexp solution has this in three places (or maybe four if I've missed one!).

Insertion of characters in strings in R

Answers (2)

Edit

Related Questions