Creating a dataframe from dbscan clustering results

Question

I have performed a clustering with geospatial data with the dbscan algorithm. You can see the project and the code in more detail here: https://notebook.community/gboeing/urban-data-science/15-Spatial-Cluster-Analysis/cluster-analysis

I would like to calculate the following in a dataframe:

the area of each cluster. It can be calculated as: (lat_max - lat_min) * (lon_max - lon_min)
number of points belonging to each cluster

At the moment I have added to the original dataset a column with the cluster to which the coordinate belongs.

for n in range(num_clusters):
    df['cluster'] = pd.Series(cluster_labels, index=df.index)

Any idea of simple code that would allow me to do this?

Dimitrius · Accepted Answer

The code is something like

import pandas as pd

df = pd.DataFrame({
    'cluster': [0, 1, 2],
    'pts': [5, 6, 10],
    'lat': [45, 47, 45],
    'lon': [24, 23, 20],
})

df = df.groupby('cluster').agg(
    min_lat=('lat', 'min'),
    max_lat=('lat', 'max'),
    min_lon=('lon', 'min'),
    max_lon=('lon', 'max'),
)

df["area"] = (df["max_lat"] - df["min_lat"]) * (df["max_lon"] - df["min_lon"])

Creating a dataframe from dbscan clustering results

Answers (1)

Related Questions