mech
mech

Reputation: 96

DBSCAN returns partial clusters

I am trying to implement the code for DBSCAN here: http://en.wikipedia.org/wiki/DBSCAN

The portion I am confused about is

expandCluster(P, NeighborPts, C, eps, MinPts) add P to cluster C for each point P' in NeighborPts if P' is not visited mark P' as visited NeighborPts' = regionQuery(P', eps) if sizeof(NeighborPts') >= MinPts NeighborPts = NeighborPts joined with NeighborPts' if P' is not yet member of any cluster add P' to cluster C

My code is below. As is, it currently returns partial clusters where a point should be density connected even if it is not in the immediate eps neighborhood. My code only returns the first few neighbors for each point.

import numpy 
import time 
from math import radians, cos, sin, asin, sqrt
import re, math


def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees) returned as kilometers 
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    km = 6367 * c
    return km



def ST_DBSCAN(points,max_distance,MinPts):
    global visited
    visited = []
    noise = []
    cluster_id = 0
    clusters = []
    in_cluster = []
    for p in points: 
        if p not in visited:
            # neighbor_points = []
            visited.append(p)
            NeighborPts = regionQuery(p,points,max_distance)
            if len(NeighborPts) < MinPts:
                noise.append(p)
            else:
                cluster_id = cluster_id + 1
                g = expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster)
                clusters.append(g)
    return clusters

#return len(NeighborPts)

def expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster):
    in_cluster.append(p[0])
    cluster = []
    cluster.append(p[0])
    for point in NeighborPts:
        if point not in visited:
            visited.append(point)
            new_neighbors = regionQuery(point,points,max_distance)
            if len(new_neighbors) >= MinPts: 
                new_neighbors.append(NeighborPts)
            if point[0] not in in_cluster:
                 in_cluster.append(point[0])
                 cluster.append(point[0])             
    return  cluster




def regionQuery(p,points,max_distance):
    neighbor_points = []
    for j in points:
        if j != p:
           # print 'P is %s and j is %s' % (p[0],j[0])
            dist = haversine(p[1],p[2],j[1],j[2])
            if dist <= max_distance:
                neighbor_points.append(j)
    neighbor_points.append(p) 
    return neighbor_points   

I have a subset below. Points 1 and 5 should be 10.76 km apart so they shouldn't be in the initial query but they should be included in the same cluster because point 5 is density connected through point 3.

pointList = [[1,36.4686,2.8289], 
[2,36.4706,2.8589], 
[3,36.4726,2.8889],
[4,36.4746,2.9189],
[5,36.4766,2.9489], 
[6,36.4786,2.9789],
[7,36.4806,3.0089], 
[8,36.4826,3.0389], 
[9,36.4846,3.0689], 
[10,36.4866,3.0989]]

points= pointList

g = ST_DBSCAN(points,10,3)

Upvotes: 1

Views: 731

Answers (1)

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

Your expandCluster function forgets the new neighbors.

Your set update is swapped.

Upvotes: 1

Related Questions