Joe Crozier
Joe Crozier

Reputation: 1036

Match Operator behaving strangely

So I have a data frame:

BMI<-c(18,25.2,31.5,19.6,11.1,25.2)
AGE<-c(21,35,45,60,99,21)
df<-data_frame(BMI,AGE)

When I use the match operator with Age it works just fine (those 30-50 years old show up as true):

df<-df%>%mutate(MediumAge=if_else(AGE%in%30:50,TRUE,FALSE))

When I use the match operator with BMI it doesn't (people with BMI in that range don't show up as TRUE, the only that that does was the exactly 18 person):

df<-df%>%mutate(Medium=if_else(BMI%in%18:29,TRUE,FALSE))

Obviously an 18 BMI would probably not be "Medium" but for sake of simple data in the example...

Must have something to do with decimal places but I can't find anything in documentation or a solution

Upvotes: 1

Views: 44

Answers (1)

camille
camille

Reputation: 16842

The %in% operator is a wrapper around match. It doesn't look at ranges of values, but rather tries to find the match of a value in a vector. They don't have to be numeric. For example:

library(tidyverse)

letters[1:6]
#> [1] "a" "b" "c" "d" "e" "f"
"e" %in% letters[1:6]
#> [1] TRUE

Where you have 18:29, you're creating a vector of integers, and then looking for matches of your BMI values in that vector. That's why you get TRUE for BMI = 18, because that exact number is in that vector, but 25.2 is not in that vector, so it returns FALSE.

It's easier to see if you print out the vectors to test:

30:50
#>  [1] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
35 %in% 30:50
#> [1] TRUE

18:29
#>  [1] 18 19 20 21 22 23 24 25 26 27 28 29
25.2 %in% 18:29
#> [1] FALSE

So since you want to know if a value is in the continuous range between two numbers, you can either use inequalities:

df %>%
  mutate(Medium = (BMI >= 18 & BMI <= 29))
#> # A tibble: 6 x 3
#>     BMI   AGE Medium
#>   <dbl> <dbl> <lgl> 
#> 1  18      21 TRUE  
#> 2  25.2    35 TRUE  
#> 3  31.5    45 FALSE 
#> 4  19.6    60 TRUE  
#> 5  11.1    99 FALSE 
#> 6  25.2    21 TRUE

or dplyr::between, which is a shorthand for the inequalities above, inclusive of its endpoints.

df %>%
  mutate(Medium = between(BMI, 18, 29))
#> # A tibble: 6 x 3
#>     BMI   AGE Medium
#>   <dbl> <dbl> <lgl> 
#> 1  18      21 TRUE  
#> 2  25.2    35 TRUE  
#> 3  31.5    45 FALSE 
#> 4  19.6    60 TRUE  
#> 5  11.1    99 FALSE 
#> 6  25.2    21 TRUE

It's also worth noting that if you're just trying to get back a logical value, you can skip the ifelse, because either of these methods of checking will return a logical already.

Upvotes: 2

Related Questions