Msilvy
Msilvy

Reputation: 1

Clustering based on location and timestamp

I have a dataset with longitude, latitude and timestamp. I want to use hierarchical clustering to cluster points that are within x miles and t duration. I understand I can use hclust and dbscan function but all of these take only one argument. Moreover, some of my points might not be in a cluster so I guess I cant use st_dbscan.

Can anyone direct me on what function/package/argument I can use for this purpose?

Upvotes: 0

Views: 476

Answers (1)

ASH
ASH

Reputation: 20362

I think you want KMeans Clustering.

# import necessary modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from collections import Counter


df = pd.read_csv('C:\\your_path_here\\properties.csv')
# df.head(10)
df = df.head(10000)


list(df)
df.shape

df.shape

df = df.sample(frac=0.2, replace=True, random_state=1)
df.shape


df = df.fillna(0)
df.isna().sum()


df['regionidzip']=df['regionidzip'].fillna(97000)
df.dropna(axis=0,how='any',subset=['latitude','longitude'],inplace=True)
X=df.loc[:,['latitude','longitude']]
zp=df.regionidzip



id_n=8
kmeans = KMeans(n_clusters=id_n, random_state=0).fit(X)
id_label=kmeans.labels_


#plot result
ptsymb = np.array(['b.','r.','m.','g.','c.','k.','b*','r*','m*','r^']);
plt.figure(figsize=(12,12))
plt.ylabel('Longitude', fontsize=12)
plt.xlabel('Latitude', fontsize=12)
for i in range(id_n):
    cluster=np.where(id_label==i)[0]
    plt.plot(X.latitude[cluster].values,X.longitude[cluster].values,ptsymb[i])
plt.show()

 

#revise the clustering based on zipcode
uniq_zp=np.unique(zp)
for i in uniq_zp:
    a=np.where(zp==i)[0]
    c = Counter(id_label[a])
    c.most_common(1)[0][0]
    id_label[a]=c.most_common(1)[0][0]

#plot result (revised)
plt.figure(figsize=(12,12))
plt.ylabel('Longitude', fontsize=12)
plt.xlabel('Latitude', fontsize=12)
for i in range(id_n):
    cluster=np.where(id_label==i)[0]
    plt.plot(X.latitude[cluster].values,X.longitude[cluster].values,ptsymb[i])
plt.show()

 

enter image description here

data source:

https://www.kaggle.com/c/zillow-prize-1/data

https://www.kaggle.com/xxing9703/kmean-clustering-of-latitude-and-longitude?select=zillow_data_dictionary.xlsx

Also...

https://www.analyticsvidhya.com/blog/2019/05/beginners-guide-hierarchical-clustering/

https://levelup.gitconnected.com/clustering-gps-co-ordinates-forming-regions-4f50caa7e4a1

Upvotes: 0

Related Questions