Reputation: 96
I am trying to implement the code for DBSCAN here: http://en.wikipedia.org/wiki/DBSCAN
The portion I am confused about is
expandCluster(P, NeighborPts, C, eps, MinPts)
add P to cluster C
for each point P' in NeighborPts
if P' is not visited
mark P' as visited
NeighborPts' = regionQuery(P', eps)
if sizeof(NeighborPts') >= MinPts
NeighborPts = NeighborPts joined with NeighborPts'
if P' is not yet member of any cluster
add P' to cluster C
My code is below. As is, it currently returns partial clusters where a point should be density connected even if it is not in the immediate eps neighborhood. My code only returns the first few neighbors for each point.
import numpy
import time
from math import radians, cos, sin, asin, sqrt
import re, math
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees) returned as kilometers
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
def ST_DBSCAN(points,max_distance,MinPts):
global visited
visited = []
noise = []
cluster_id = 0
clusters = []
in_cluster = []
for p in points:
if p not in visited:
# neighbor_points = []
visited.append(p)
NeighborPts = regionQuery(p,points,max_distance)
if len(NeighborPts) < MinPts:
noise.append(p)
else:
cluster_id = cluster_id + 1
g = expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster)
clusters.append(g)
return clusters
#return len(NeighborPts)
def expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster):
in_cluster.append(p[0])
cluster = []
cluster.append(p[0])
for point in NeighborPts:
if point not in visited:
visited.append(point)
new_neighbors = regionQuery(point,points,max_distance)
if len(new_neighbors) >= MinPts:
new_neighbors.append(NeighborPts)
if point[0] not in in_cluster:
in_cluster.append(point[0])
cluster.append(point[0])
return cluster
def regionQuery(p,points,max_distance):
neighbor_points = []
for j in points:
if j != p:
# print 'P is %s and j is %s' % (p[0],j[0])
dist = haversine(p[1],p[2],j[1],j[2])
if dist <= max_distance:
neighbor_points.append(j)
neighbor_points.append(p)
return neighbor_points
I have a subset below. Points 1 and 5 should be 10.76 km apart so they shouldn't be in the initial query but they should be included in the same cluster because point 5 is density connected through point 3.
pointList = [[1,36.4686,2.8289],
[2,36.4706,2.8589],
[3,36.4726,2.8889],
[4,36.4746,2.9189],
[5,36.4766,2.9489],
[6,36.4786,2.9789],
[7,36.4806,3.0089],
[8,36.4826,3.0389],
[9,36.4846,3.0689],
[10,36.4866,3.0989]]
points= pointList
g = ST_DBSCAN(points,10,3)
Upvotes: 1
Views: 731
Reputation: 77454
Your expandCluster
function forgets the new neighbors.
Your set update is swapped.
Upvotes: 1