josephn
josephn

Reputation: 49

r- Error when trying to use mutate with case_when

I am trying to add vector to a data frame holding the region of each US state. I have tried the following code and keep on getting an error message. I'm new to the tidyverse so any help you can offer would be appreciated. I'm guessing it's something small and embarrassing. :)

df <- df %>%
  mutate(region = case_when((State=="Connecticut"|State=="Maine"|State=="Massachusetts"|State=="New Hampshire"|State=="Rhode Island"|State=="Vermont"~ "New England"), 
                            case_when((State=="Delaware"| State=="District of Columbia" | State=="Maryland"| State=="New Jersey"| State=="New York"| State=="Pennsylvania"~ "Central Atlanic"),
                                      case_when((State=="Florida"| State=="Georgia"| State=="North Carolina"|State=="South Carolina"|  State=="Virginia"| State=="West Virginia"~ "Lower Atlantic"),
                                                case_when((State=="Illinois"| State=="Indiana"| State=="Iowa"| State=="Kansas"| State=="Kentucky"| State=="Michigan"| State=="Minnesota"| State=="Missouri"| State=="Nebraska"| State=="North Dakota"| State=="Ohio"| State=="Oklahoma"| State=="South Dakota"| State=="Tennessee" |State=="Wisconsin"~ "Midwest"),
                                                          case_when((State=="Alabama" | State=="Arkansas" | State=="Louisiana"| State=="Mississippi"| State=="New Mexico"| State=="Texas"~ "Gulf Coast"), 
                                                                    case_when((State=="Colorado"| State=="Idaho" | State=="Montana"| State=="Utah"| State=="Wyoming"~ "Rocky Mountain"),
                                                                              case_when((State=="Alaska" | State=="Arizona" | State=="California"| State=="Hawaii" | State=="Nevada"| State=="Oregon"| State=="Washington"~ "West Coast"), TRUE~"NA"))))))))

Error in mutate(): ! Problem while computing region = case_when(...). Caused by error in case_when(): ! Case 2 ((State == "Colorado" | State == "Idaho" | State == "Montana" | State == "Utah" | State == "Wyoming" ~ "Rocky Mountain")) must be a two-sided formula, not a character vector.

Upvotes: 0

Views: 916

Answers (1)

Parfait
Parfait

Reputation: 107587

As docs show, there is no need to nest case_when. Simply, separate the mutually exclusive conditions by commas. Also, consider %in% and avoid the many OR calls.

mutate(region = case_when(
    State %in% c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", "Vermont") ~ "New England"), 
    State %in% c("Delaware", "District of Columbia", "Maryland", "New Jersey", "New York", "Pennsylvania") ~ "Central Atlantic"),
    ..., 
    TRUE ~ NA
))

In fact, consider simply merging and avoid any conditional logic:

txt = 'State Region
Connecticut "New England"
Maine "New England"
Massachusetts "New England"
"New Hampshire" "New England"
"Rhode Island" "New England"
Vermont "New England"
Delaware "Central Atlantic"
"District of Columbia" "Central Atlantic"
Maryland "Central Atlantic"
"New Jersey" "Central Atlantic"
"New York" "Central Atlantic"
Pennsylvania  "Central Atlantic"
...'

region_df <- read.table(text = txt, header = TRUE)
region_df
#                   State           Region
# 1           Connecticut      New England
# 2                 Maine      New England
# 3         Massachusetts      New England
# 4         New Hampshire      New England
# 5          Rhode Island      New England
# 6               Vermont      New England
# 7              Delaware Central Atlantic
# 8  District of Columbia Central Atlantic
# 9              Maryland Central Atlantic
# 10           New Jersey Central Atlantic
# 11             New York Central Atlantic
# 12         Pennsylvania Central Atlantic
# ...

main_df <- merge(main_df, region_df, by = "State")

Upvotes: 1

Related Questions