Function to Calculate Whether A Point is Within Ellipse using stat_ellipse

Question

The question seems simple enough. I can plot 95% credible/confidence ellipses in ggplot using stat_ellipse. I want to find out how to test whether a given point is within that ellipse.

For example, here is a dataset of points and their 95% ellipses, plus a set of randomvalues. I'd like a function that calculates TRUE or FALSE whether the randompalues falls within the ellipse.

library(tidyverse)

set.seed(420)

randomvalues <- data.frame(x = sample(min(mtcars$hp):max(mtcars$hp), 10),
                           y = sample(min(mtcars$disp):max(mtcars$disp), 10))

ggplot(data = mtcars)+
  geom_point(aes(x = hp, y = disp, color = as.factor(cyl)))+
  stat_ellipse(aes(x = hp, y = disp, color = as.factor(cyl)))+
  geom_point(data = randomvalues, aes(x = x, y = y))

This (very contrived) solution comes close, but when points are close to the border of an ellipse, it fails. The test point here is 310,300, which is just outside of ellipse 8. This function says it falls within ellipse 8.

point_in_ellipse <- function(x, y, points, level = 0.95) {
  
  # Calculate mean of points
  mean_x <- mean(points[, 1])
  mean_y <- mean(points[, 2])
  
  # Calculate covariance matrix of points
  cov_mat <- cov(points)
  
  # Calculate eigenvalues and eigenvectors of covariance matrix
  eigen <- eigen(cov_mat)
  
  # Calculate semi-major and semi-minor axes of ellipse
  lambda_1 <- sqrt(eigen$values[1] * qchisq(level, df = 2))
  lambda_2 <- sqrt(eigen$values[2] * qchisq(level, df = 2))
  
  # Calculate orientation of ellipse
  theta <- atan2(eigen$vectors[2, 1], eigen$vectors[1, 1])
  
  # Convert point to ellipse coordinates
  dx <- x - mean_x
  dy <- y - mean_y
  xp <- dx * cos(theta) + dy * sin(theta)
  yp <- -dx * sin(theta) + dy * cos(theta)
  
  # Calculate distance from point to ellipse center
  d <- sqrt((xp / lambda_1)^2 + (yp / lambda_2)^2)
  
  # Check if point is within ellipse
  if (d <= 1) {
    return(TRUE)
  } else {
    return(FALSE)
  }
}

# Split mtcars by cylinder
ellipse.points <- mtcars %>%
  select(hp, disp, cyl) %>%
  split(f = .$cyl)

# Add a randomvalue that is on the border of an ellipse
randomvalues <- rbind(randomvalues, c(310, 300))

# Make a list to receive the answers
answer <- list()

# For each cylinder, calculate the ellipse, and calcualte whether the randomvalues fall inside it
for(i in names(ellipse.points)){
  answer[[i]] <- apply(randomvalues, 1, function(x) point_in_ellipse(x = x[1], y = x[2], ellipse.points[[i]]) )
}

# Convert back to df
final.df <- cbind(randomvalues, do.call(cbind, answer))

I think stat_ellipse must not use the chi squared distribution or the covariance matrix, or must use another way of calculating its ellipses.

Any ideas?

EDIT: 8 May 2022 Based on suggestions from @tjebo, I tried to calculate the ellipses with car::dataEllipse(), then convert the ellipses to sf objects, then use st_intersects to find which points intersect which ellipses.

# Calculate ellipses for each cyl using car::dataEllipse
list.of.cyls <- split(mtcars[,c(4,3)], f = as.factor(mtcars$cyl))

cyl.ellipses <- lapply(list.of.cyls, function(x) car::dataEllipse(x = x$hp, y = x$disp, draw = FALSE) %>%
                           .$`0.95` %>%
                           as_tibble(.) %>%
                           st_as_sf(coords = c("x", "y")) %>%
                           summarise(geometry = st_combine(geometry)) %>%
                           st_cast("POLYGON")
)

# Convert randomvalues to sf
randomvalues.sf <- randomvalues %>%
  st_as_sf(coords = c("x", "y"))

# Already the ellipses are off
ggplot()+
  geom_sf(data = do.call(bind_rows, cyl.ellipses), fill = NA)+
  geom_point(data = randomvalues, aes(x = x, y = y))

ggplot()+
  stat_ellipse(data = mtcars, aes(x = hp, y = disp, color = as.factor(cyl)))+
  geom_point(data = randomvalues, aes(x = x, y = y))

Before we do any intersection calculations, the ellipses are already off. Not sure how my calculation of ellipses using dataEllipses varies from stat_ellipse, but somehow it does.

Anyone have an idea?

Function to Calculate Whether A Point is Within Ellipse using stat_ellipse

Answers (1)

Related Questions