Reputation: 110062
Using the base install functions what is the fastest way to capitalize the first letter in a vector of text strings?
I have provided a solution below but it seems to be a very inefficient approach (using substring and pasting it all together). I'm guessing there's a regex solution I'm not thinking of.
Once I have a few responses I'll benchmark them and report back the fastest solution using microbenchmarking.
Thank you in advance for your help.
x <- c("i like chicken.", "mmh so good", NA)
#desired output
[1] "I like chicken." "Mmh so good" NA
Upvotes: 4
Views: 253
Reputation: 49830
I didn't time it, but I bet this is pretty fast
capitalize <- function(string) {
#substring(string, 1, 1) <- toupper(substring(string, 1, 1))
substr(string, 1, 1) <- toupper(substr(string, 1, 1))
string
}
capitalize(x)
#[1] "I like chicken." "Mmh so good" NA
Upvotes: 5
Reputation: 102306
The Hmisc
package contains a capitalize
function:
> require(Hmisc)
> capitalize(c("i like chicken.", "mmh so good", NA))
[1] "I like chicken." "Mmh so good" NA
(Although this appears to be slower than both the substring
and regular expression versions.)
Upvotes: 3
Reputation: 19224
I think this will be slowest, but let it race against other solutions:
capitalize<-function(string) {
sub("^(.)","\\U\\1", string, perl=TRUE )
}
x <- c("i like chicken.", "mmh so good", NA)
capitalize(x)
EDIT: actually on ideone it is faster than substring
EDIT 2: matching any lowercase letter turns out to be slightly slower:
sub("^(\\p{Ll})","\\U\\1", string, perl=TRUE)
Upvotes: 4
Reputation: 110062
My solution using substring:
capitalize <- function(string) {
cap <- function(x) {
if (is.na(x)) {
NA
}
else {
nc <- nchar(x)
paste0(toupper(substr(x, 1, 1)), substr(x,
2, nc))
}
}
sapply(string, cap, USE.NAMES = FALSE)
}
x <- c("i like chicken.", "mmh so good", NA)
capitalize(x)
Upvotes: 1