Terry
Terry

Reputation: 113

Identifying spatial clusters in Python with consideration to additional attributes

I'm working with a large set comprising of spatial parcels, with each row containing geographic coordinates (UTM), parcel area & value:

[x, y, area, value]:
[272564.9434265977, 6134243.108910706, 980.63, 550.6664083293393],
[272553.9611341293, 6134209.499155387, 1026.55, 477.32696897374706],
[271292.4197118982, 6132982.047648986, 634.438, 851.1469993915875],
...

Plotting these visually identifies several distinct zones where dollar value varies based on geography (the high value strip on the left is coastal, for example):scatter plot, x/y/value

I would like to identify clusters of value (ie the coastal strip) & have looked at several approaches;

Can anybody suggest any other toolkits/approaches to consider?

Upvotes: 3

Views: 1725

Answers (3)

unbutu
unbutu

Reputation: 125

What about creating price contours? (like the type of contours in geological maps). Instead of a contour connecting points of similar elevation the contours would connect points of similar price.

You'd get a map of parcels that are "clustered" according to contour intervals (price values) but with the contour boundaries defining zones reflecting certain price characteristics.

You could then extract the parcels that lie within each price band (contour) and assign them a particular cluster number. Doing this for all the parcels would give you spatially connected "clusters" of prices that truly reflect the observed data without needing to rely on complex ML clustering algorithms that never seem to get things right.

Upvotes: 0

Leo Martins
Leo Martins

Reputation: 285

At least in hierarchical clustering you can define connectivity constraints such that only "connected" samples can belong to same cluster. In your case x and y would be used by function sklearn.neighbors.kneighbors_graph() to create the list of neighbours, and the value variable will be used in the clustering.

Upvotes: 2

Has QUIT--Anony-Mousse
Has QUIT--Anony-Mousse

Reputation: 77454

Look at generalized DBSCAN (GDBSCAN), which easily allows you to require neighbor points to both

  1. be within a distance threshold (as in regular DBSCAN)
  2. have the price vary by at most 10% (to prevent the coastal area from merging with the regular)

Upvotes: 2

Related Questions