Reputation: 9
I have performed a clustering with geospatial data with the dbscan algorithm. You can see the project and the code in more detail here: https://notebook.community/gboeing/urban-data-science/15-Spatial-Cluster-Analysis/cluster-analysis
I would like to calculate the following in a dataframe:
the area of each cluster. It can be calculated as: (lat_max - lat_min) * (lon_max - lon_min)
number of points belonging to each cluster
At the moment I have added to the original dataset a column with the cluster to which the coordinate belongs.
for n in range(num_clusters):
df['cluster'] = pd.Series(cluster_labels, index=df.index)
Any idea of simple code that would allow me to do this?
Upvotes: 0
Views: 289
Reputation: 796
The code is something like
import pandas as pd
df = pd.DataFrame({
'cluster': [0, 1, 2],
'pts': [5, 6, 10],
'lat': [45, 47, 45],
'lon': [24, 23, 20],
})
df = df.groupby('cluster').agg(
min_lat=('lat', 'min'),
max_lat=('lat', 'max'),
min_lon=('lon', 'min'),
max_lon=('lon', 'max'),
)
df["area"] = (df["max_lat"] - df["min_lat"]) * (df["max_lon"] - df["min_lon"])
Upvotes: 1