arahant
arahant

Reputation: 2203

finding very close points on plane - approximate clustering algorithm needed

I have many points (latitudes and longitudes) on a plane (a city) and I want to find two clusters. Cluster 1 is points cluttered close together and Cluster 2 is everything else.

I know the definition of the problem is not exact. The only thing defined is that I need exactly 2 clusters. Out of N points, how many end up in cluster 1 or cluster 2 is not defined.

The main aim is to identify points which are very close to each other and separate them from the rest (which are more more evenly spread out)

The best I can think of is the following algorithm:

1. For each point, Calculate the sum of the square distances to all other points.
2. Run the k-means with k=2 on these square distances

The squaring (or maybe even higher order) of the distance should help by raising the dimensionality. However this algorithm will be biased towards points near the center of the city. It will struggle to find clusters at the edges of the city.

Any suggestions on how to avoid this problem? And any other suggestions to improve this algorithm

Upvotes: 0

Views: 1156

Answers (2)

fuch
fuch

Reputation: 156

Since points in cluster 1 are close to each other, I think a density-based clustering algorithm may help. You may try the OPTICS algorithm, which is similar to DBSCAN but is aware of varying density and the number of clusters can be specified by the user.

Upvotes: 1

collapsar
collapsar

Reputation: 17238

i'd suggest something along the following lines:

key concept

count number of neighbouring points at distance less than a given value.

semiformal description

  1. count number nc(P) of neighbouring points at distance less than a given value d_cutoff for each point P.
  2. cluster all points P_i with nc(P_i) greater than a threshold thres_count into cluster #1.
  3. for each P_i in cluster #1 add its close neighbors, i.e. points Q with d(Q, P_i) < d_cutoff to the very same cluster #1.
  4. set cluster #2 to the complement of cluster #1.

algorithmic angle

  1. build an undirected graph G=(V, E) with your points being the vertex set V and an edge between every pair of points at a distance less than d_cutoff from each other.
  2. delete all edges e=(v,w) from the graph where deg(v) < thres_count and deg(w) < thres_count.
  3. G's isolated vertices form cluster #2, the complement is cluster #1.

heuristic on how to choose d_cutoff

build a minimum spanning tree (mst) of your point set. the frequency distribution of edge lengths should hint at suitable cutoff values. short pairwise distances will be incorporated into the mst first. thus there should be at least one pronounced gap in the ordered sequence of edge lengths for point sets with a natural clustering. so partition the set of mst edge lengths into a small number of adjacent intervals, ordering these intervals in the natural way. count how many actual distance values fall into each interval. consider the map between an interval's ordinal number and its count of distance values. large deltas between functions values for successive arguments would suggest to take the upper bound of distances in the lower interval as d_cutoff.

Upvotes: 1

Related Questions