sander
sander

Reputation: 1440

Sklearn extra: KMedoids missing 'method' parameter

According to the Sklearn_extra documentation on KMedoids, KMedoids should have the following parameters: n_clusters, metric, method, init, max_iter and random_state. The method parameter determines which algorithm to use: alternate or pam. According to sklearn_extra's user guide these methods are inherently different from each other. For my specific application I want to use the PAM version of K-medoids. However, the method parameter seems to have disappeared. When I run an inspect on the KMedoids function:

import inspect
from sklearn_extra.cluster import KMedoids
inspect.getargspec(KMedoids)

I get the following output:

ArgSpec(args=['self', 'n_clusters', 'metric', 'init', 'max_iter', 'random_state'], varargs=None, 
keywords=None, defaults=(8, 'euclidean', 'heuristic', 300, None))

Here, the method parameter is also missing. In the code of KMedoids it appears to be still there. Does anyone know where the parameter has gone? I cannot find anything about it on the internet.

Upvotes: 1

Views: 2174

Answers (2)

desertnaut
desertnaut

Reputation: 60390

There seems to be a discrepancy between the latest Github version (and the corresponding documentation) and the latest version available at PyPi (currently dated 29 March, 2020).

If we install from PyPi with pip,

pip install scikit-learn-extra

and then inspect with

print(inspect.getsource(KMedoids))

which actually gives the source code of the package in our machine, we get

class KMedoids(BaseEstimator, ClusterMixin, TransformerMixin):
    """k-medoids clustering.

    Read more in the :ref:`User Guide <k_medoids>`.

    Parameters
    ----------
    n_clusters : int, optional, default: 8
        The number of clusters to form as well as the number of medoids to
        generate.

    metric : string, or callable, optional, default: 'euclidean'
        What distance metric to use. See :func:metrics.pairwise_distances

    init : {'random', 'heuristic', 'k-medoids++'}, optional, default: 'heuristic'
        Specify medoid initialization method. 'random' selects n_clusters
        elements from the dataset. 'heuristic' picks the n_clusters points
        with the smallest sum distance to every other point. 'k-medoids++'
        follows an approach based on k-means++_, and in general, gives initial
        medoids which are more separated than those generated by the other methods.
        
        .. _k-means++: https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf

    max_iter : int, optional, default : 300
        Specify the maximum number of iterations when fitting.

    random_state : int, RandomState instance or None, optional
        Specify random state for the random number generator. Used to
        initialise medoids when init='random'.

and

    def __init__(
        self,
        n_clusters=8,
        metric="euclidean",
        init="heuristic",
        max_iter=300,
        random_state=None,
    ):
        self.n_clusters = n_clusters
        self.metric = metric
        self.init = init
        self.max_iter = max_iter
        self.random_state = random_state

i.e. the method argument is indeed nowhere to be seen.

Installing from Github:

pip install git+https://github.com/scikit-learn-contrib/scikit-learn-extra.git

and

inspect.getfullargspec(KMedoids)

gives indeed

FullArgSpec(args=['self', 'n_clusters', 'metric', 'method', 'init', 'max_iter', 'random_state'], varargs=None, varkw=None, defaults=(8, 'euclidean', 'alternate', 'heuristic', 300, None), kwonlyargs=[], kwonlydefaults=None, annotations={})

It is interesting that, in both cases (PyPi and Github), the reported version is exactly the same (0.1.0b2); something seems to have gone wrong here in terms of software development good practices...

Upvotes: 1

mujjiga
mujjiga

Reputation: 16916

The method parameter is available in the the latest development version. So uninstall the existing version you have and install the latest directly from github using:

pip install https://github.com/scikit-learn-contrib/scikit-learn-extra/archive/master.zip

Sample (as shown in docs):

!pip install https://github.com/scikit-learn-contrib/scikit-learn-extra/archive/master.zip

from sklearn_extra.cluster import KMedoids
import numpy as np

X = np.asarray([[1, 2], [1, 4], [1, 0],
                [4, 2], [4, 4], [4, 0]])
kmedoids = KMedoids(n_clusters=2, random_state=0, method="pam").fit(X)
kmedoids.labels_

Output:

array([0, 0, 0, 1, 1, 1])

Upvotes: 1

Related Questions