Trevor Hayes
Trevor Hayes

Reputation: 21

Issue with logical operators in R, using ifelse

Below is the code I'm using. The idea is that we have a variable "habitaciones" which counts the number of rooms in a hotel. I create a new variable called hrango, which has ranges based on the number of rooms. Hotels with 20 or less are small, between 21 and 40 are medium, and greater than 40 are large. You can see how I tried to process this using operators.

The problem when I run this code, is any hotel with less than 10 rooms is labeled as large, and any hotel with more than 100 rooms is labeled as small, and I cannot seem to figure out why. I started out using code based on "replace" which wasn't working properly, moved to ifelse, and still can't get the result I want.

Any help would be appreciated.

full_data <- full_data %>% 
  mutate (hrango = ifelse(habitaciones < 21, "Hoteles Pequenos",
                          ifelse(habitaciones > 20 & habitaciones < 41, "Hoteles Medios",
                                 ifelse(habitaciones > 40, "Hoteles Grandes", hrango)
                                        )
                                        )
                                       )

Upvotes: 1

Views: 435

Answers (1)

r2evans
r2evans

Reputation: 160447

  1. In general, base R's ifelse has some baggage, namely that it will drop class. As an example,

    ifelse(c(T,F), rep(Sys.time(),2), rep(Sys.time(),2))
    # [1] 1591376254 1591376254
    

    Since you're already using dplyr, I suggest you consider dplyr::if_else:

    if_else(c(T,F), rep(Sys.time(),2), rep(Sys.time(),2))
    # [1] "2020-06-05 09:57:57 PDT" "2020-06-05 09:57:57 PDT"
    

    (data.table::fifelse is also good.)

  2. When I see nested ifelses, I think case_when would be better. It isn't often faster (it's about the same), but it is much more readable and therefore maintainable.

    full_data %>%
      mutate(
        hrango = case_when(
          habitaciones < 21                     ~ "Hoteles Pequenos",
          habitaciones > 20 & habitaciones < 41 ~ "Hoteles Medios",
          habitaciones > 40                     ~ "Hoteles Grandes",
          TRUE ~ hrango)
      )
    

    Since case_when stops evaluating (for each element) after the first true, you could shorten this a little:

    full_data %>%
      mutate(
        hrango = case_when(
          habitaciones < 21 ~ "Hoteles Pequenos",
          habitaciones < 41 ~ "Hoteles Medios",
          habitaciones > 40 ~ "Hoteles Grandes",
          TRUE ~ hrango)
      )
    
  3. Further, since you're just looking among a continuous range of values, you could use cut:

    full_data %>%
      mutate(
        hrango = cut(hrango, c(-Inf, 20, 40, Inf),
                     labels = c("Hoteles Pequenos", "Hoteles Medios", "Hoteles Grandes")),
        hrango = as.character(hrango)
      )
    

    where the second assignment of as.character is because cut returns factor and I'm inferring you want character.

    One more note: I'm inferring that you are using integers because of your overlapping conditions. The case_when solution will still work, but if your data is numeric then you might need to rethink your boundaries to make sure you are getting what you need. Since cut defaults to "right-closed" (right=TRUE), it will give you left-open, so (0,20] and (20,40] means that 20 will match the first, 20.1 and 21 will match the second, etc.

Upvotes: 4

Related Questions