sayem siam
sayem siam

Reputation: 1311

Why do I need to specify number of clusters in OpenCV hierarchical clustering

If we know number of clusters in the input data, we can use k-means algorithm. However if we don't know the number of clusters then we have options to use hierarchical clustering algorithm which will automatically return the number of clusters based on the given similarity threshold. There are two options for the hierarchical clustering i.e., Agglomerative (bottom-up) or divisive (top-down), link. I want to use OpenCV hierarchical clustering.

However, OpenCVs hierarchicalClustering algorithm takes centers as a parameter and it uses number of rows as a desired number of clusters unlike the actual hierarchical clustering. To me, OpenCV hierarchicalClustering is same as k-means clustering. Is there any other function in OpenCV which can return the number of clusters based on the given similarity threshold?

typedef cv::flann::L2<float> D;
float a[] = {0, 0, 0, 0 };

cvflann::Matrix< D::ResultType> centers(a, 2, 2, 0);

const cvflann::KMeansIndexParams params1(
        2,
       100,
       cvflann::flann_centers_init_t::FLANN_CENTERS_RANDOM,
       .2 
);


int number_of_clusters = cvflann::hierarchicalClustering<D> (features,
                                                             centers,
                                                             params1
                                                             );


Another parameter that we pass in is cb_index = 0.2. Is it a threshold for the distances between clusters or it's a cluster bound as radius threshold.

Upvotes: 1

Views: 545

Answers (1)

Nuzhny
Nuzhny

Reputation: 1927

number_of_clusters is the real number of clusters and it can be less that the centers size. The centers size value is the maximum of clusters count.

See you this example:

    // clustering
    Mat1f centers(clusterNum, descriptorNum);
    ::cvflann::KMeansIndexParams kmean_params;
    unsigned int resultClusters = hierarchicalClustering< L2<float> >(samples, centers, kmean_params);
    if (resultClusters < clusterNum)
    {
        centers = centers.rowRange(Range(0, resultClusters));
    }
    Index flann_index(centers, KDTreeIndexParams());
    printf("resulted clusters number: %u\n", resultClusters);

Upvotes: 2

Related Questions