Reputation: 120
I have an array (M x N) of air pressure data (gridded model data). There's also two arrays (also M x N) for latitudes and longitudes. To build a GeoJSON of isobars (surfaces of equal pressure) I need to find clusters of pressure values with given step (1 Pa, 0.5 Pa). In general I was thinking to solve it like that:
But step 3 is not yet clear to me: how to find clusters in a smart way? Which algorithm should I look for? Can I do that with scipy.cluster package?
Upvotes: 2
Views: 5677
Reputation: 77505
I don't think you are looking for cluster at all.
Apparently the isobar ranges are given. So split your data set on them; you do not need to sort for this - just find the minimum and maximum to get all buckets, then select data according to each bucket separately. This breaks the problem down nicely into smaller chunks.
I guess your problem is largely a visualization one. You want to display areas of similar pressure instead of points, right?
Instead of looking at statistical methods such as least-squares optimization (k-means), which require you to predefine the parameter k, consider looking at visualization techniques such as Alpha Shapes (closely related to convex hulls, but they also allow non-convex shapes). If you compute alpha shapes for each of your pressure domains, you should get a nice visualization of these regions.
If you insist on using clustering, have a look at DBSCAN. Mostly for the reason that it allows non-convex shaped clusters, and that it can work with latitude+longitude (k-means doesn't). But even HAC may be able to give you good results, since you can define your cut threshold based on your data resolution (e.g. merge any points - in the same pressure bucket - if they are less than 1km apart).
Upvotes: 1