Reputation: 331
I'm cleaning a list of business names and I'm struggling to selectively convert the cases to title case. I can use the mutate(str_to_title(...))
functions to convert the whole field to title case, and that works great for most of my values, but there are a handful that are titled like "ABC Company" or "John Doe Company LLC", and when I apply title case, that messes their proper cases up ("Abc Company" and "John Doe Company Llc").
I thought I could use case_when()
and a vector of specific values to create a function that tells R to only apply title case to values that do not equal the vector of values I specify. However, I either come up with a warning that "longer object length is not a multiple of shorter object length", and all the values are converted to title case, or I simply get NAs for the vector values in my field and correct title case values for the values not in my vector. Where am I going wrong?
# Example Code #
library(tidyverse)
## Reproducible Example ##
test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC",
"rainbow road company", "yellow brick road incorporated", "XYZ",
"Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -7L))
test<-test%>%
mutate(`Company Name`= case_when(`Company Name`!= c("ABC Company","John Doe Company LLC","XYZ") ~ str_to_title(`Company Name`)))
# Error #
Warning message:
There was 1 warning in `mutate()`.
ℹ In argument: `Company Name = case_when(...)`.
Caused by warning in `` `Company Name` != c("ABC Company", "John Doe Company LLC", "XYZ") ``:
! longer object length is not a multiple of shorter object length
Upvotes: 3
Views: 84
Reputation: 8295
This is a different approach to a more general case. If your original data has things like "LLC" in it, we can preserve those but title-case everything else.
First we find the locations of any all-caps words, then we title-case everything, and then replace the all-caps back into their original spots. There's an if-block as well for skipping when there's no all-caps to replace.
library(stringr)
library(dplyr)
test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC",
"rainbow road company", "yellow brick road incorporated", "XYZ",
"Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -7L))
respectful_title = function(s) {
caps = str_locate_all(s, "[A-Z]{2,}")
# find any elements that are more than one capital in a row
purrr::map2(
s, caps, ~{
if (nrow(.y)) {
x_ = str_to_title(.x)
str_sub(x_, .y[, 1], .y[, 2]) <- str_sub(.x, .y[, 1], .y[, 2])
x_
# replace title case elements with their originals
} else {
str_to_title(.x)
}
}
) %>%
unlist()
}
And we can see that it works with your test data:
test %>%
mutate(fixed = respectful_title(`Company Name`))
#> # A tibble: 7 × 2
#> `Company Name` fixed
#> <chr> <chr>
#> 1 ABC Company ABC Company
#> 2 John Doe Company LLC John Doe Company LLC
#> 3 rainbow road company Rainbow Road Company
#> 4 yellow brick road incorporated Yellow Brick Road Incorporated
#> 5 XYZ XYZ
#> 6 Mostly Ghostly Company Mostly Ghostly Company
#> 7 hot Leaf juice tea company Hot Leaf Juice Tea Company
Created on 2024-11-26 with reprex v2.1.1
Upvotes: 1
Reputation: 4147
However, I either come up with a warning that "longer object length is not a multiple of shorter object length", and all the values are converted to title case, or I simply get NAs for the vector values in my field and correct title case values for the values not in my vector. Where am I going wrong?
When you mutate Company Name
with "case_when()" you need so specify a default case like this:
case_when(
!(`Company Name`%in% c("ABC Company","John Doe Company LLC","XYZ")) ~ str_to_title(`Company Name`), # ! inverts the case, so if the vector values are not in Company Name
.default = `Company Name`
)
Since it was missing in your example, there is no default if your case 1 does not apply and therefore the rest is filled with NA-values.
Alternatively you can use a function that only capitalizes strings which start with a lower case, which prevents the need of defining exceptions in the first place. I included both examples below :)
library(tidyverse)
## Reproducible Example ##
test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC",
"rainbow road company", "yellow brick road incorporated", "XYZ",
"Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -7L))
# Function to capitalize words selectively
capitalize_words <- function(input_string) {
str_replace_all(input_string, "\\b[a-z][a-z]*\\b", function(word) {
str_to_title(word)
})
}
test<-test%>%
mutate(`Capitalized Company Names case when`= case_when( !(`Company Name`%in% c("ABC Company","John Doe Company LLC","XYZ")) ~ str_to_title(`Company Name`), .default = `Company Name`),
`Capitalized Company Names with function` = capitalize_words(`Company Name`))
and end up with this result:
Upvotes: 1