giuseppe sabino
giuseppe sabino

Reputation: 1

How to solve "Duplicated samples have been found in X" error for DBCV metric

I'm trying to compute the DBCV metric (provided by "git+https://github.com/FelSiq/DBCV") on density-based clusters from a dataset similar to the one shown here:

image

The calculation is performed with the following code:

dbcv_score = dbcv(X, labels)

where X represents the dataset, and labels are the labels produced by the DBSCAN algorithm.

I've already tried removing duplicates from X, but the same error persists:

5 frames ValueError: Duplicated samples have been found in X.

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/dbcv/core.py in _check_duplicated_samples(X, threshold)

ValueError: Duplicated samples have been found in X.

How can I resolve this issue?

Upvotes: 0

Views: 43

Answers (1)

Thanasis
Thanasis

Reputation: 1

You may use:

dbcv_score = dbcv(X, labels, check_duplicates=False)

I'm not quite familiar with the process, still experimenting, but inspecting the code, I found that it has that parameter as well, and turning it to False seems to have worked for me!

Upvotes: 0

Related Questions