Faisal Aldhuwayhi
Faisal Aldhuwayhi

Reputation: 23

ValueError: Buffer dtype mismatch, expected 'double_t' but got 'float' - hdbscan validity_index

I'm using the validity index in the hdbscan package, which implements DBCV score according to the following paper: https://www.dbs.ifi.lmu.de/~zimek/publications/SDM2014/DBCV.pdf

I'm working on a face clustering project, and after using the validity index it prompts an error.

Here is the code:

dbcv_score_output = hdbscan.validity.validity_index(feature_vectors, archive_labels)
dbcv_score_output

The full error:

hdbscan/validity.py:30: RuntimeWarning: overflow encountered in power
  distance_matrix[distance_matrix != 0] = (1.0 / distance_matrix[

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/lib/python3.9/site-packages/hdbscan/validity.py:371, in validity_index(X, labels, metric, d, per_cluster_scores, mst_raw_dist, verbose, **kwd_args)
    356         continue
    358     distances_for_mst, core_distances[
    359         cluster_id] = distances_between_points(
    360         X,
   (...)
    367         **kwd_args
    368     )
    370     mst_nodes[cluster_id], mst_edges[cluster_id] = \
--> 371         internal_minimum_spanning_tree(distances_for_mst)
    372     density_sparseness[cluster_id] = mst_edges[cluster_id].T[2].max()
    374 for i in range(max_cluster_id):

File ~/anaconda3/lib/python3.9/site-packages/hdbscan/validity.py:165, in internal_minimum_spanning_tree(mr_distances)
    136 def internal_minimum_spanning_tree(mr_distances):
    137     """
    138     Compute the 'internal' minimum spanning tree given a matrix of mutual
    139     reachability distances. Given a minimum spanning tree the 'internal'
   (...)
...
    167     for index, row in enumerate(min_span_tree[1:], 1):

File hdbscan/_hdbscan_linkage.pyx:15, in hdbscan._hdbscan_linkage.mst_linkage_core()

ValueError: Buffer dtype mismatch, expected 'double_t' but got 'float'

A quick look at the inputs and its types:

When I tried to change the features type to double/float64, it showed a different kind of error:

hdbscan/validity.py:33: RuntimeWarning: invalid value encountered in true_divide
  result /= distance_matrix.shape[0] - 1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/lib/python3.9/site-packages/hdbscan/validity.py:372, in validity_index(X, labels, metric, d, per_cluster_scores, mst_raw_dist, verbose, **kwd_args)
    358     distances_for_mst, core_distances[
    359         cluster_id] = distances_between_points(
    360         X,
   (...)
    367         **kwd_args
    368     )
    370     mst_nodes[cluster_id], mst_edges[cluster_id] = \
    371         internal_minimum_spanning_tree(distances_for_mst)
--> 372     density_sparseness[cluster_id] = mst_edges[cluster_id].T[2].max()
    374 for i in range(max_cluster_id):
    376     if np.sum(labels == i) == 0:

File ~/anaconda3/lib/python3.9/site-packages/numpy/core/_methods.py:40, in _amax(a, axis, out, keepdims, initial, where)
     38 def _amax(a, axis=None, out=None, keepdims=False,
     39           initial=_NoValue, where=True):
---> 40     return umr_maximum(a, axis, None, out, keepdims, initial, where)

ValueError: zero-size array to reduction operation maximum which has no identity

I went through all the related issues and fixes in the repo but with no avail. Are there any recommendations or fixes?

Upvotes: 1

Views: 199

Answers (1)

Alexey Kiselev
Alexey Kiselev

Reputation: 11

I fixed that issue by converting np.array from float to double. In your case try to use:

feature_vectors=feature_vectors.astype('double')

before call validity_index.

Upvotes: 0

Related Questions