887
887

Reputation: 619

What is causing case_when double vector error?

I am looking to create a new variable that indicates if an individual is to the left or right side (if their 'y' value is greater than 'a' they're to the left, if their y value is less than a they are to the right). I have tried this code:

df <- df %>% 
    mutate(Side = case_when(y > a ~ "Left", 
                            y < a ~ "Right",
                            y = a ~ "C"))

However, when I try it I get this error:

Error: Problem with `mutate()` input `Side`.
x object 'y' not found
ℹ Input `Side` is `case_when(y > a ~ "Left", y < a ~ "Right", y = a ~ "C")`.

I am very lost since both a and y are numeric vectors. Any idea why this is?

structure(list(y = c(26.85, 26.85, 26.85, 26.85, 26.85, 
26.85, 26.85, 26.85, 26.85, 26.85, 26.85, 26.85, 26.85, 26.85, 
26.85, 26.85, 26.85, 26.85, 26.85, 26.85), a = c(26.67, 36.47, 
44.16, 22.01, 36.15, 28.7, 26.63, 31.12, 20.53, 43.49, 21.83, 
26.59, 26.71, 26.85, 26.67, 36.47, 44.17, 22, 36.15, 28.7)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))
``

Upvotes: 0

Views: 123

Answers (1)

r2evans
r2evans

Reputation: 160717

= is assignment, == is a test of equality. Change to y == a ~ "C".

df %>% 
  mutate(Side = case_when(y > a ~ "Left", 
                          y < a ~ "Right",
                          y == a ~ "C"))
# # A tibble: 20 x 3
#        y     a Side 
#    <dbl> <dbl> <chr>
#  1  26.8  26.7 Left 
#  2  26.8  36.5 Right
#  3  26.8  44.2 Right
#  4  26.8  22.0 Left 
#  5  26.8  36.2 Right
#  6  26.8  28.7 Right
#  7  26.8  26.6 Left 
#  8  26.8  31.1 Right
#  9  26.8  20.5 Left 
# 10  26.8  43.5 Right
# 11  26.8  21.8 Left 
# 12  26.8  26.6 Left 
# 13  26.8  26.7 Left 
# 14  26.8  26.8 C    
# 15  26.8  26.7 Left 
# 16  26.8  36.5 Right
# 17  26.8  44.2 Right
# 18  26.8  22   Left 
# 19  26.8  36.2 Right
# 20  26.8  28.7 Right

Having said that ... beware, floating-point equality is not always perfect. Computers have limitations when it comes to floating-point numbers (aka double, numeric, float). This is a fundamental limitation of computers in general, in how they deal with non-integer numbers. This is not specific to any one programming language. There are some add-on libraries or packages that are much better at arbitrary-precision math, but I believe most main-stream languages (this is relative/subjective, I admit) do not use these by default. Refs: Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754.

If you find yourself in a situation where you "know" that two numbers are the same but == returns FALSE, consider converting to a measure of absolute-difference, perhaps something like:

eps <- 1e-8
df %>% 
  mutate(Side = case_when(y > a ~ "Left", 
                          y < a ~ "Right",
                          abs(y - a) < eps ~ "C"))

The actual value to use for eps is specific to your needs; given the data we see here in df, then 1e-8 is likely to be sufficient. In other data, look for a value that is well smaller than the expected range, but still an order of magnitude or more larger than .Machine$double.eps (about 2e-16).

Upvotes: 3

Related Questions