maia-sh
maia-sh

Reputation: 641

`stringr` to convert first letter only to uppercase in dataframe

I would like to capitalize the first letter of each word in a column, without converting remaining letters to lowercase. I am trying to use stringr since its vectorized and plays well with dataframes, but would also use another solution. Below is a reprex showing my desired output and various attempts. I am able to select the first letter only, but then not sure how to capitalize it. Thank you for your help!

I also reviewed related posts, but wasn't sure how to apply those solutions in my case (i.e., within a dataframe):

First letter to upper case

Capitalize the first letter of both words in a two word string

library(dplyr)
library(stringr)

words <-
  tribble(
    ~word, ~number,
    "problems", 99,
    "Answer", 42,
    "golden ratio", 1.61,
    "NOTHING", 0
  )

# Desired output
new_words <-
  tribble(
    ~word, ~number,
    "Problems", 99,
    "Answer", 42,
    "Golden Ratio", 1.61,
    "NOTHING", 0
  )

# Converts first letter of each word to upper and all other to lower
mutate(words, word = str_to_title(word))
#> # A tibble: 4 x 2
#>   word         number
#>   <chr>         <dbl>
#> 1 Problems      99   
#> 2 Answer        42   
#> 3 Golden Ratio   1.61
#> 4 Nothing        0

# Some attempts
mutate(words, word = str_replace_all(word, "(?<=^|\\s)([a-zA-Z])", "X"))
#> # A tibble: 4 x 2
#>   word         number
#>   <chr>         <dbl>
#> 1 Xroblems      99   
#> 2 Xnswer        42   
#> 3 Xolden Xatio   1.61
#> 4 XOTHING        0
mutate(words, word = str_replace_all(word, "(?<=^|\\s)([a-zA-Z])", "\\1"))
#> # A tibble: 4 x 2
#>   word         number
#>   <chr>         <dbl>
#> 1 problems      99   
#> 2 Answer        42   
#> 3 golden ratio   1.61
#> 4 NOTHING        0

Created on 2021-07-26 by the reprex package (v2.0.0)

Upvotes: 7

Views: 5342

Answers (3)

TarJae
TarJae

Reputation: 78937

We could use str_to_title function from stringr package.

The problem is that NOTHING turns to Nothing.

But we can overcome this with an ifelse -> checking if first character is uppercase then leaf else make uppercase.

library(dplyr)
library(stringr)
words %>% 
    mutate(word = ifelse(str_detect(word, "^[:upper:]+$"), word,str_to_title(word)))

Output:

  word         number
  <chr>         <dbl>
1 Problems      99   
2 Answer        42   
3 Golden Ratio   1.61
4 NOTHING        0 

Upvotes: 3

Arthur Yip
Arthur Yip

Reputation: 6230

According to https://community.rstudio.com/t/is-there-will-there-be-perl-support-in-stringr/38016/3 stringr uses stringi and the ICU engine, so it does not and will not support perl type regex (which is what enables the \U\1 part in other answers). So you should use the gsub with perl=TRUE answer by @Tim.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521467

Here is a base R solution using gsub:

words$word <- gsub("\\b([a-z])", "\\U\\1", words$word, perl=TRUE)

This will replace the first lowercase letter of every word with its uppercase version. Note that the \b word boundary will match a lowercase preceded by either whitespace or the start of the column's value.

Upvotes: 6

Related Questions