Identifying spatial clusters in Python with consideration to additional attributes

I'm working with a large set comprising of spatial parcels, with each row containing geographic coordinates (UTM), parcel area & value:

[x, y, area, value]:
[272564.9434265977, 6134243.108910706, 980.63, 550.6664083293393],
[272553.9611341293, 6134209.499155387, 1026.55, 477.32696897374706],
[271292.4197118982, 6132982.047648986, 634.438, 851.1469993915875],
...

Plotting these visually identifies several distinct zones where dollar value varies based on geography (the high value strip on the left is coastal, for example):

I would like to identify clusters of value (ie the coastal strip) & have looked at several approaches;

K-means seems the easiest clustering method to implement, but appears unsuitable due to only considering distance between points and no further attributes.
ClusterPy looks ideal for this application but their documentation only seems to cover working with GIS files.
DBSCAN seems more relevant but I'm not sure how I can include my additional attribute ($ value) - perhaps as a third dimension?

Can anybody suggest any other toolkits/approaches to consider?

Upvotes: 3

Answers (3)

unbutu

Reputation: 125

What about creating price contours? (like the type of contours in geological maps). Instead of a contour connecting points of similar elevation the contours would connect points of similar price.

You'd get a map of parcels that are "clustered" according to contour intervals (price values) but with the contour boundaries defining zones reflecting certain price characteristics.

You could then extract the parcels that lie within each price band (contour) and assign them a particular cluster number. Doing this for all the parcels would give you spatially connected "clusters" of prices that truly reflect the observed data without needing to rely on complex ML clustering algorithms that never seem to get things right.

Upvotes: 0

Leo Martins

Reputation: 285

At least in hierarchical clustering you can define connectivity constraints such that only "connected" samples can belong to same cluster. In your case x and y would be used by function sklearn.neighbors.kneighbors_graph() to create the list of neighbours, and the value variable will be used in the clustering.

Upvotes: 2

Has QUIT--Anony-Mousse

Reputation: 77454

Look at generalized DBSCAN (GDBSCAN), which easily allows you to require neighbor points to both

be within a distance threshold (as in regular DBSCAN)
have the price vary by at most 10% (to prevent the coastal area from merging with the regular)

Upvotes: 2

Identifying spatial clusters in Python with consideration to additional attributes

Answers (3)

Related Questions