Reputation: 57
I have a Dataset
with columns cat1
, cat2
, cat3
, city
.
I want to get cities
in some clusters.
Is it possible clustering df['city']
according to other
three columns?
Upvotes: 0
Views: 966
Reputation: 44828
You can cluster the cats first, then, since each pack of cats corresponds to a city, use the resulting labels to cluster the cities:
>>> import pandas as pd
>>> from sklearn.cluster import KMeans
>>> df = pd.DataFrame({'cat1': [-1, -2, -1, 3, 2], 'cat2': [-2, -1, -3, 1, 2], 'city': ['London', 'Paris', 'Lyon', 'Washington', 'Rome']})
>>> # some pairs of cats are all negative,
>>> # some pics are all positive,
>>> # so we definitely got two clusters
>>> df
cat1 cat2 city
0 -1 -2 London
1 -2 -1 Paris
2 -1 -3 Lyon
3 3 1 Washington
4 2 2 Rome
>>> X = df[['cat1', 'cat2']].values
>>> X # the cats
array([[-1, -2],
[-2, -1],
[-1, -3],
[ 3, 1],
[ 2, 2]])
>>> # cluster the cats and get their labels
>>> lab = KMeans(2).fit(X).labels_
>>> lab
array([0, 0, 0, 1, 1], dtype=int32)
>>> # use labels to cluster cities
>>> # London, Paris and Lyon have all-negative cats
>>> df['city'][lab == 0]
0 London
1 Paris
2 Lyon
Name: city, dtype: object
>>> Washington and Rome have all-positive cats
>>> df['city'][lab == 1]
3 Washington
4 Rome
Name: city, dtype: object
>>>
Upvotes: 1