Mikko
Mikko

Reputation: 7755

R: Capitalizing everything after a certain character

I would like to capitalize everything in a character vector that comes after the first _. For example the following vector:

x <- c("NYC_23df", "BOS_3_rb", "mgh_3_3_f") 

Should come out like this:

"NYC_23DF" "BOS_3_RB" "mgh_3_3_F"

I have been trying to play with regular expressions, but am not able to do this. Any suggestions would be appreciated.

Upvotes: 17

Views: 4910

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 270348

gsubfn in the gsubfn package is like gsub except the replacement string can be a function. Here we match _ and everything afterwards feeding the match through toupper :

library(gsubfn)

gsubfn("_.*", toupper, x)
## [1] "NYC_23DF"  "BOS_3_RB"  "mgh_3_3_F"

Note that this approach involves a particularly simple regular expression.

Upvotes: 14

Ben Bolker
Ben Bolker

Reputation: 226951

You were very close:

gsub("(_.*)","\\U\\1",x,perl=TRUE)

seems to work. You just needed to use _.* (underscore followed by zero or more other characters) rather than _* (zero or more underscores) ...

To take this apart a bit more:

  • _.* gives a regular expression pattern that matches an underscore _ followed by any number (including 0) of additional characters; . denotes "any character" and * denotes "zero or more repeats of the previous element"
  • surrounding this regular expression with parentheses () denotes that it is a pattern we want to store
  • \\1 in the replacement string says "insert the contents of the first matched pattern", i.e. whatever matched _.*
  • \\U, in conjunction with perl=TRUE, says "put what follows in upper case" (uppercasing _ has no effect; if we wanted to capitalize everything after (for example) a lower-case g, we would need to exclude the g from the stored pattern and include it in the replacement pattern: gsub("g(.*)","g\\U\\1",x,perl=TRUE))

For more details, search for "replacement" and "capitalizing" in ?gsub (and ?regexp for general information about regular expressions)

Upvotes: 27

Rappster
Rappster

Reputation: 13100

Simple example using base::strsplit

x <- c("NYC_23df", "BOS_3_rb", "mgh_3_3_f", "a") 

myCap <- function(x) {
    out <- sapply(x, function(y) {
        temp <- unlist(strsplit(y, "_"))
        out <- temp[1]
        if (length(temp[-1])) {
            out <- paste(temp[1], paste(toupper(temp[-1]), 
                collapse="_"), sep="_") 
        }
        return(out)
    })
    out
}

> myCap(x)
   NYC_23df    BOS_3_rb   mgh_3_3_f           a 
 "NYC_23DF"  "BOS_3_RB" "mgh_3_3_F"         "a" 

Example using the stringr package

pkg <- "stringr"
if (!require(pkg, character.only=TRUE)) {
    install.packages(pkg)
    require(pkg, character.only=TRUE)   
}

myCap.2 <- function(x) {
    out <- sapply(x, function(y) {
        idx <- str_locate(y, "_")
        if (!all(is.na(idx[1,]))) {
            str_sub(y, idx[,1], nchar(y)) <- toupper(str_sub(y, idx[,1], nchar(y))) 
        }
        return(y)
    })
    out
}

> myCap.2(x)
   NYC_23df    BOS_3_rb   mgh_3_3_f           a 
 "NYC_23DF"  "BOS_3_RB" "mgh_3_3_F"         "a" 

Upvotes: 4

Related Questions