Rasmus
Rasmus

Reputation: 69

Shrink convex hull

I have a bunch of points in 2D space and have calculated a convex hull for them. I would now like to "tighten" the hull so that it no longer necessarily encompasses all points. In the typical nails-in-board-with-rubber-band analogy, what I'd like to achieve is to be able to tune the elasticity of the rubber band and allow nails to bend at pressure above some limit. That's just an analogy, there is no real physics here. This would kind-of be related to the reduction in hull area if a given point was removed, but not quite because there could be two points that are very close to each-other. This is not necessarily related to outlier detection, because you could imagine a pattern where a large fractions of the nails would bend if they are on a narrow line (imagine a hammer shape for example). All of this has to be reasonably fast for thousands of points. Any hints where I should look in terms of algorithms? An implementation in R would be perfect, but not needed.

enter image description here

EDIT AFTER COMMENT: The three points I've labelled are those with largest potential for reducing the hull area if they are excluded. In the plot there is no other set of three points that would result in a larger area reduction. A naive implementation of what I'm looking for would maybe be to randomly sample some fraction of the points, calculate the hull area, remove each point on the hull iteratively, recalculate the area, repeat many times and remove points that tend to lead to high area reduction. Maybe this could be implemented in some random forest variant? It's not quite right though, because I would like the points to be removed one by one so that you get the following result. If you looked at all points in one go it would possibly be best to trim from the edges of the "hammer head".

enter image description here

Upvotes: 0

Views: 1364

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269854

Calculate a robust center and variance using mcd.cov in MASS and the mahalanobis distance of each point from it (using mahalanobis in psych). We then show a quantile plot of the mahalanobis distances using PlotMD from modi and also show the associated outliers in red in the second plot. (There are other functions in modi that may be of interest as well.)

library(MASS)
library(modi)
library(psych)

set.seed(69)   
x <- runif(20)
y <- runif(20)
m <- cbind(x, y)

mcd <- cov.mcd(m)
md <- mahalanobis(m, mcd$center, mcd$cov)
stats <- PlotMD(md, 2, alpha = 0.90)

giving:

(continued after screenshot) screenshot

and we show the convex hull using lines and the outliers in red:

plot(m)    
ix <- chull(m)
lines(m[c(ix, ix[1]), ])

wx <- which(md > stats$halpha)
points(m[wx, ], col = "red", pch = 20)

screenshot

Upvotes: 1

Rasmus
Rasmus

Reputation: 69

Thank you both! I've tried various methods for outlier detection, but it's not quite what I was looking for. They have worked badly due to weird shapes of my clusters. I know I talked about convex hull area, but I think filtering on segment lengths yields better results and is closer to what I really wanted. Then it would look something like this:

shrink <- function(xy, max_length = 30){
      to_keep <- 1:(dim(xy)[1])
      centroid <- c(mean(xy[,1]), mean(xy[,2]))

      while (TRUE){
        ss <- chull(xy[,1], xy[,2])
        ss <- c(ss, ss[1])
        lengths <- sapply(1:(length(ss)-1), function(i) sum((xy[ss[i+1],] - xy[ss[i],])^2))

        # This gets the point with the longest convex hull segment. chull returns points
        # in clockwise order, so the point to remove is either this one or the one
        # after it. Remove the one furthest from the centroid.
        max_point <- which.max(lengths)
        if (lengths[max_point] < max_length) return(to_keep)

        if (sum((xy[ss[max_point],] - centroid)^2) > sum((xy[ss[max_point + 1],] - centroid)^2)){
          xy <- xy[-ss[max_point],]
          to_keep <- to_keep[-ss[max_point]]
        }else{
          xy <- xy[-ss[max_point + 1],]
          to_keep <- to_keep[-ss[max_point + 1]]
        }
      }
    }

It's not optimal because it factors in the distance to the centroid, which I would have liked to avoid, and there is a max_length parameter that should be calculated from the data instead of being hard-coded.

No filter: enter image description here

It looks like this because there are 500 000 cells in here, and there are many that end up "wrong" when projecting from ~20 000 dimensions to 2.

Filter: enter image description here

Note that it filters out points at tips of some clusters. This is less-than-optimal but ok. The overlap between some clusters is true and should be there.

Upvotes: 0

Allan Cameron
Allan Cameron

Reputation: 174278

Suppose I have a set of points like this:

set.seed(69)

x <- runif(20)
y <- runif(20)

plot(x, y)

enter image description here

Then it is easy to find the subset points that sit on the convex hull by doing:

ss <- chull(x, y)

This means we can plot the convex hull by doing:

lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "red")

enter image description here

Now we can randomly remove one of the points that sits on the convex hull (i.e. "bend a nail") by doing:

bend <- ss[sample(ss, 1)]

x <- x[-bend]
y <- y[-bend]

And we can then repeat the process of finding the convex hull of this new set of points:

ss <- chull(x, y)
lines(x[c(ss, ss[1])], y[c(ss, ss[1])], col = "blue", lty = 2)

enter image description here

To get the point which will, on removal, cause the greatest reduction in area, one option would be the following function:

library(sp)

shrink <- function(coords)
{
  ss <- chull(coords[, 1], coords[, 2])
  outlier <- ss[which.min(sapply(seq_along(ss), 
                function(i) Polygon(coords[ss[-i], ], hole = FALSE)@area))]
  coords[-outlier, ]
}

So you could do something like:

coords <- cbind(x, y)

new_coords <- shrink(coords)

new_chull <- new_coords[chull(new_coords[, 1], new_coords[, 2]),]
new_chull <- rbind(new_chull, new_chull[1,])

plot(x, y)

lines(new_chull[,1], new_chull[, 2], col = "red")

enter image description here

Of course, you could do this in a loop so that new_coords is fed back into shrink multiple times.

Upvotes: 2

Related Questions