Reputation: 1073
I have a delivery app, and I want to group orders (each order has a lat and lng coordinates) by location proximity (linear distance) and constraints like max orders and max total products (each order has an amount of products) inside a group.
For proximity grouping I used DBSCAN
coordinates = [[lat,lng],[lat,lng]],[lat,lng]],[lat,lng]],[lat,lng]]]
distance_matrix = squareform(pdist(coordinates, (lambda u,v: haversine(u,v))))
#eps=0.1 => 100m radius, 50m linear
db = DBSCAN(eps=0.1, min_samples=2, metric='precomputed')
results = db.fit(distance_matrix)
How can I add constraints in this functionality?
Is there anyway of doing this by using something else than DBSCAN or HDBSCAN ?
Upvotes: 1
Views: 2535
Reputation: 20322
This is an interesting question. I suppose it can be done a bunch of different ways. Here is one solution for you to condsider.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import seaborn as sns; sns.set()
import csv
df = pd.read_csv('C:\\your_path\\properties_2017.csv')
# df.head(10)
df = df.head(10000)
df.shape
df.dropna(axis=0,how='any',subset=['latitude','longitude'],inplace=True)
# Variable with the Longitude and Latitude
X=df.loc[:,['parcelid','latitude','longitude']]
X.head(10)
K_clusters = range(1,10)
kmeans = [KMeans(n_clusters=i)
for i in K_clusters]
Y_axis = df[['latitude']]
X_axis = df[['longitude']]
score = [kmeans[i].fit(Y_axis).score(Y_axis)
for i in range(len(kmeans))] # Visualize
plt.plot(K_clusters, score)
plt.xlabel('Number of Clusters')
plt.ylabel('Score')
plt.title('Elbow Curve')
plt.show()
kmeans = KMeans(n_clusters = 10, init ='k-means++')
kmeans.fit(X[X.columns[1:3]]) # Compute k-means clustering.X['cluster_label'] = kmeans.fit_predict(X[X.columns[1:3]])centers = kmeans.cluster_centers_ # Coordinates of cluster centers.labels = kmeans.predict(X[X.columns[1:3]]) # Labels of each pointX.head(10)
X['cluster_label'] = kmeans.fit_predict(X[X.columns[1:3]])
centers = kmeans.cluster_centers_ # Coordinates of cluster centers.
labels = kmeans.predict(X[X.columns[1:3]]) # Labels of each pointX.head(10)
X.head(5)
X = X[['parcelid','cluster_label']]
X.head(5)
clustered_data = df.merge(X, left_on='parcelid', right_on='parcelid')
clustered_data.head(5)
centers = kmeans.cluster_centers_
print(centers)
X=df.loc[:,['parcelid','latitude','longitude']]
X.plot.scatter(x = 'latitude', y = 'longitude', c=labels, s=50, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.5)
data = X
labels = kmeans.labels_
plt.subplots_adjust(bottom = 0.1)
plt.scatter(data.iloc[:, 1], data.iloc[:, 2], c=kmeans.labels_, cmap='rainbow')
for label, x, y in zip(labels, data.iloc[:, 1], data.iloc[:, 2]):
plt.annotate(
label,
xy=(x, y), xytext=(-20, 20),
textcoords='offset points', ha='right', va='bottom',
bbox=dict(boxstyle='round,pad=0.5', fc='red', alpha=0.5),
arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0'))
plt.show()
# labels pointing to each data point (this is a big jumbled together; you should probably select fewer data points to analyze).
Reference:
https://levelup.gitconnected.com/clustering-gps-co-ordinates-forming-regions-4f50caa7e4a1
Data source:
https://www.kaggle.com/c/zillow-prize-1/data
Upvotes: 1
Reputation: 2624
Unfortunately, I think the model you want should be developed from the scratch.
Your question can be modeled as following optimization model.
Objective function: minimize the number of clusters, K Constraint (1) The size of every cluster is equal or smaller than S (parameter) (2) The number of orders in every cluster is equal or smaller than O (parameter) (3) For a sample x and its cluster Ck, dist(x, Ck) be the minimum among dist(x, C1), dist(x, C2), ..., dist(x, CK).
Solving this problem would require a lot of efforts...
Upvotes: 0