James
James

Reputation: 29

How do I reduce the number of ConvexHull Vertices from a Pandas Dataframe

I am using scipy.spatial ConvexHull to create an envelope around a dataset, here is an example that creates the dataset and the envelope using ConvexHull,

import pandas as pd
import numpy as np
from scipy.spatial import ConvexHull, convex_hull_plot_2d

df = pd.DataFrame(np.random.randn(1000, 2), columns=list(['col1', 'col2']))
hull = ConvexHull(df[['col1', 'col2']])
hull_indices = hull.vertices
print(df.iloc[hull_indices])

I would now like to remove a ConvexHull Vertices if it is a close neighbour. The intent is to reduce the number of ConvexHull Vertices.

Could I use scipy.spatial.KDTree to find the close neighbour?

Thanks in advance for any help.

Upvotes: 0

Views: 136

Answers (1)

Nuclearman
Nuclearman

Reputation: 5304

You wouldn't really need to, basically you just window through the convex hull points and see if they are within X distance of each other, if so, create a new point that is the midpoint (mean point) of the two points. If multiple points in a row can be removed, then the new hull point should the midpoint of all sequentially removed points.

Pythonic Pseudocode (might be valid Python assuming the missing methods are valid, my Python is rusty):

new_hull = []
for i in len(hull_points):
  removed = []
  a = hull_points[i]
  for b in hull_points[a+1,-1]:
    - if distance(a,b) >= max_distance
      removed.push(b)
  if len(removed) > 0
    removed.push(a)
    mean_point = calculate_mean_point(removed)
    new_hull.push(mean_point)
  else
    new_hull.push(a)

There is a bug here in that it should be circular. IE: The algorithm should continue after it has reached the last hull point and keep going around in circles until it goes through the entire hull without removing any points.

Upvotes: 0

Related Questions