Mary Rachel
Mary Rachel

Reputation: 331

How to mutate all values except for a vector of selected values with case_when()

I'm cleaning a list of business names and I'm struggling to selectively convert the cases to title case. I can use the mutate(str_to_title(...)) functions to convert the whole field to title case, and that works great for most of my values, but there are a handful that are titled like "ABC Company" or "John Doe Company LLC", and when I apply title case, that messes their proper cases up ("Abc Company" and "John Doe Company Llc").

I thought I could use case_when() and a vector of specific values to create a function that tells R to only apply title case to values that do not equal the vector of values I specify. However, I either come up with a warning that "longer object length is not a multiple of shorter object length", and all the values are converted to title case, or I simply get NAs for the vector values in my field and correct title case values for the values not in my vector. Where am I going wrong?

# Example Code #

library(tidyverse)

## Reproducible Example ##

test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC", 
"rainbow road company", "yellow brick road incorporated", "XYZ", 
"Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -7L))

test<-test%>%
  mutate(`Company Name`= case_when(`Company Name`!= c("ABC Company","John Doe Company LLC","XYZ") ~ str_to_title(`Company Name`)))

# Error #
Warning message:
There was 1 warning in `mutate()`.
ℹ In argument: `Company Name = case_when(...)`.
Caused by warning in `` `Company Name` != c("ABC Company", "John Doe Company LLC", "XYZ") ``:
! longer object length is not a multiple of shorter object length 

Upvotes: 3

Views: 84

Answers (2)

Brian
Brian

Reputation: 8295

This is a different approach to a more general case. If your original data has things like "LLC" in it, we can preserve those but title-case everything else.

First we find the locations of any all-caps words, then we title-case everything, and then replace the all-caps back into their original spots. There's an if-block as well for skipping when there's no all-caps to replace.

library(stringr)
library(dplyr)


test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC", 
                                        "rainbow road company", "yellow brick road incorporated", "XYZ", 
                                        "Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df", 
                                                                                                            "tbl", "data.frame"), row.names = c(NA, -7L))

respectful_title = function(s) {
  caps = str_locate_all(s, "[A-Z]{2,}")
  # find any elements that are more than one capital in a row

  purrr::map2(
    s, caps, ~{
      if (nrow(.y)) {
        x_ = str_to_title(.x)
        str_sub(x_, .y[, 1], .y[, 2]) <- str_sub(.x, .y[, 1], .y[, 2])
        x_
        # replace title case elements with their originals
      } else {
        str_to_title(.x)
      }
    }
  ) %>% 
    unlist()

}

And we can see that it works with your test data:


test %>% 
  mutate(fixed = respectful_title(`Company Name`))
#> # A tibble: 7 × 2
#>   `Company Name`                 fixed                         
#>   <chr>                          <chr>                         
#> 1 ABC Company                    ABC Company                   
#> 2 John Doe Company LLC           John Doe Company LLC          
#> 3 rainbow road company           Rainbow Road Company          
#> 4 yellow brick road incorporated Yellow Brick Road Incorporated
#> 5 XYZ                            XYZ                           
#> 6 Mostly Ghostly Company         Mostly Ghostly Company        
#> 7 hot Leaf juice tea company     Hot Leaf Juice Tea Company

Created on 2024-11-26 with reprex v2.1.1

Upvotes: 1

Tim G
Tim G

Reputation: 4147

However, I either come up with a warning that "longer object length is not a multiple of shorter object length", and all the values are converted to title case, or I simply get NAs for the vector values in my field and correct title case values for the values not in my vector. Where am I going wrong?

When you mutate Company Name with "case_when()" you need so specify a default case like this:

case_when( 
    !(`Company Name`%in% c("ABC Company","John Doe Company LLC","XYZ")) ~ str_to_title(`Company Name`), # ! inverts the case, so if the vector values are not in Company Name
    .default = `Company Name`
)

Since it was missing in your example, there is no default if your case 1 does not apply and therefore the rest is filled with NA-values.

Alternatively you can use a function that only capitalizes strings which start with a lower case, which prevents the need of defining exceptions in the first place. I included both examples below :)

library(tidyverse)

## Reproducible Example ##

test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC", 
                                        "rainbow road company", "yellow brick road incorporated", "XYZ", 
                                        "Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df", 
                                                                                                            "tbl", "data.frame"), row.names = c(NA, -7L))

# Function to capitalize words selectively
capitalize_words <- function(input_string) {
  str_replace_all(input_string, "\\b[a-z][a-z]*\\b", function(word) {
    str_to_title(word)
  })
}



test<-test%>%
  mutate(`Capitalized Company Names case when`= case_when( !(`Company Name`%in% c("ABC Company","John Doe Company LLC","XYZ")) ~ str_to_title(`Company Name`), .default = `Company Name`),
         `Capitalized Company Names with function` = capitalize_words(`Company Name`))

and end up with this result:

enter image description here

Upvotes: 1

Related Questions