Reputation: 21
Below is the code I'm using. The idea is that we have a variable "habitaciones" which counts the number of rooms in a hotel. I create a new variable called hrango, which has ranges based on the number of rooms. Hotels with 20 or less are small, between 21 and 40 are medium, and greater than 40 are large. You can see how I tried to process this using operators.
The problem when I run this code, is any hotel with less than 10 rooms is labeled as large, and any hotel with more than 100 rooms is labeled as small, and I cannot seem to figure out why. I started out using code based on "replace" which wasn't working properly, moved to ifelse, and still can't get the result I want.
Any help would be appreciated.
full_data <- full_data %>%
mutate (hrango = ifelse(habitaciones < 21, "Hoteles Pequenos",
ifelse(habitaciones > 20 & habitaciones < 41, "Hoteles Medios",
ifelse(habitaciones > 40, "Hoteles Grandes", hrango)
)
)
)
Upvotes: 1
Views: 435
Reputation: 160447
In general, base R's ifelse
has some baggage, namely that it will drop class. As an example,
ifelse(c(T,F), rep(Sys.time(),2), rep(Sys.time(),2))
# [1] 1591376254 1591376254
Since you're already using dplyr
, I suggest you consider dplyr::if_else
:
if_else(c(T,F), rep(Sys.time(),2), rep(Sys.time(),2))
# [1] "2020-06-05 09:57:57 PDT" "2020-06-05 09:57:57 PDT"
(data.table::fifelse
is also good.)
When I see nested ifelse
s, I think case_when
would be better. It isn't often faster (it's about the same), but it is much more readable and therefore maintainable.
full_data %>%
mutate(
hrango = case_when(
habitaciones < 21 ~ "Hoteles Pequenos",
habitaciones > 20 & habitaciones < 41 ~ "Hoteles Medios",
habitaciones > 40 ~ "Hoteles Grandes",
TRUE ~ hrango)
)
Since case_when
stops evaluating (for each element) after the first true, you could shorten this a little:
full_data %>%
mutate(
hrango = case_when(
habitaciones < 21 ~ "Hoteles Pequenos",
habitaciones < 41 ~ "Hoteles Medios",
habitaciones > 40 ~ "Hoteles Grandes",
TRUE ~ hrango)
)
Further, since you're just looking among a continuous range of values, you could use cut
:
full_data %>%
mutate(
hrango = cut(hrango, c(-Inf, 20, 40, Inf),
labels = c("Hoteles Pequenos", "Hoteles Medios", "Hoteles Grandes")),
hrango = as.character(hrango)
)
where the second assignment of as.character
is because cut
returns factor
and I'm inferring you want character
.
One more note: I'm inferring that you are using integer
s because of your overlapping conditions. The case_when
solution will still work, but if your data is numeric
then you might need to rethink your boundaries to make sure you are getting what you need. Since cut
defaults to "right-closed" (right=TRUE
), it will give you left-open, so (0,20]
and (20,40]
means that 20 will match the first, 20.1 and 21 will match the second, etc.
Upvotes: 4