Reputation: 1036
So I have a data frame:
BMI<-c(18,25.2,31.5,19.6,11.1,25.2)
AGE<-c(21,35,45,60,99,21)
df<-data_frame(BMI,AGE)
When I use the match operator with Age it works just fine (those 30-50 years old show up as true):
df<-df%>%mutate(MediumAge=if_else(AGE%in%30:50,TRUE,FALSE))
When I use the match operator with BMI it doesn't (people with BMI in that range don't show up as TRUE, the only that that does was the exactly 18 person):
df<-df%>%mutate(Medium=if_else(BMI%in%18:29,TRUE,FALSE))
Obviously an 18 BMI would probably not be "Medium" but for sake of simple data in the example...
Must have something to do with decimal places but I can't find anything in documentation or a solution
Upvotes: 1
Views: 44
Reputation: 16842
The %in%
operator is a wrapper around match
. It doesn't look at ranges of values, but rather tries to find the match of a value in a vector. They don't have to be numeric. For example:
library(tidyverse)
letters[1:6]
#> [1] "a" "b" "c" "d" "e" "f"
"e" %in% letters[1:6]
#> [1] TRUE
Where you have 18:29
, you're creating a vector of integers, and then looking for matches of your BMI values in that vector. That's why you get TRUE
for BMI = 18, because that exact number is in that vector, but 25.2 is not in that vector, so it returns FALSE
.
It's easier to see if you print out the vectors to test:
30:50
#> [1] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
35 %in% 30:50
#> [1] TRUE
18:29
#> [1] 18 19 20 21 22 23 24 25 26 27 28 29
25.2 %in% 18:29
#> [1] FALSE
So since you want to know if a value is in the continuous range between two numbers, you can either use inequalities:
df %>%
mutate(Medium = (BMI >= 18 & BMI <= 29))
#> # A tibble: 6 x 3
#> BMI AGE Medium
#> <dbl> <dbl> <lgl>
#> 1 18 21 TRUE
#> 2 25.2 35 TRUE
#> 3 31.5 45 FALSE
#> 4 19.6 60 TRUE
#> 5 11.1 99 FALSE
#> 6 25.2 21 TRUE
or dplyr::between
, which is a shorthand for the inequalities above, inclusive of its endpoints.
df %>%
mutate(Medium = between(BMI, 18, 29))
#> # A tibble: 6 x 3
#> BMI AGE Medium
#> <dbl> <dbl> <lgl>
#> 1 18 21 TRUE
#> 2 25.2 35 TRUE
#> 3 31.5 45 FALSE
#> 4 19.6 60 TRUE
#> 5 11.1 99 FALSE
#> 6 25.2 21 TRUE
It's also worth noting that if you're just trying to get back a logical value, you can skip the ifelse
, because either of these methods of checking will return a logical already.
Upvotes: 2