Reputation: 140
I have a question about clustering. When you're using k-nearest neighbour algorithm, you have to say, how many clusters you're expecting. My problem is now, that I have some runs, where the number of clusters varies. I checked, and there are some methods how you can restrict, how many clusters you have, but these algorithms work for a two-dimensional problem. In my case, I have three features. Do you have an idea, of what algorithms I can use for a three-dimensional problem? I would be pleased if someone could help me because I also did some research by myself and I could not find anything. :)
Here for example it should locate two clusters, the one single point and the data row as the second cluster:
Here for example the second example, here I'm expectation the algorithm can find automatically three clusters, the long line, the short line and the single point:
Thanks. :)
Upvotes: 2
Views: 730
Reputation: 2532
As @ForceBru said in the comment you can use the k-means algorithm also for 3D data. I always use the sklearn.cluster.KMeans
class when I have to deal with 3D points to cluster.
Take also a look at this link where you can find a simple example to get started:
The key part of the example provided in the link above is the following:
from sklearn.cluster import KMeans
from sklearn import datasets
np.random.seed(5)
iris = datasets.load_iris()
X = iris.data
y = iris.target
estimators = [
("k_means_iris_8", KMeans(n_clusters=8)),
("k_means_iris_3", KMeans(n_clusters=3)),
("k_means_iris_bad_init", KMeans(n_clusters=3, n_init=1, init="random")),
]
You can also try to use the DBSCAN algorithm (but I am not an expert with it). Take a look here.
EDIT
I studied a little bit the DBSCAN algorithm from the sklearn.cluster
library and I have also found an interesting SO answer here.
So, when the number of cluster is not known a-priori you can do something like this (I have tried to reproduce your input):
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
data = np.array(
[[0,0,0], [1,1,1], [2,2,2], [3,3,3], [4,4,4], [5,5,5], [20, 20, 20]]
)
model = DBSCAN(eps=2.5, min_samples=2)
model.fit_predict(data)
pred = model.fit_predict(data)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter(data[:,0], data[:,1], data[:,2], c=model.labels_, s=20)
plt.show()
print("number of cluster found: {}".format(len(set(model.labels_))))
print('cluster for each point: ', model.labels_)
Here is what I get from the code above:
Try to study the DBSCAN parameters from the documentation and then adjust them to meet your goals.
Finally, here is a tons of other clustering algorithms, take a look at it!
Hope it helps!
Upvotes: 1