a.programmer
a.programmer

Reputation: 57

Clustering One Column According To Some Other Columns Python

I have a Dataset with columns cat1 , cat2 , cat3 , city. I want to get cities in some clusters. Is it possible clustering df['city'] according to other three columns?

Upvotes: 0

Views: 966

Answers (1)

ForceBru
ForceBru

Reputation: 44828

You can cluster the cats first, then, since each pack of cats corresponds to a city, use the resulting labels to cluster the cities:

>>> import pandas as pd
>>> from sklearn.cluster import KMeans
>>> df = pd.DataFrame({'cat1': [-1, -2, -1, 3, 2], 'cat2': [-2, -1, -3, 1, 2], 'city': ['London', 'Paris', 'Lyon', 'Washington', 'Rome']})
>>> # some pairs of cats are all negative,
>>> # some pics are all positive,
>>> # so we definitely got two clusters
>>> df
   cat1  cat2        city
0    -1    -2      London
1    -2    -1       Paris
2    -1    -3        Lyon
3     3     1  Washington
4     2     2        Rome
>>> X = df[['cat1', 'cat2']].values
>>> X # the cats
array([[-1, -2],
       [-2, -1],
       [-1, -3],
       [ 3,  1],
       [ 2,  2]])
>>> # cluster the cats and get their labels
>>> lab = KMeans(2).fit(X).labels_
>>> lab
array([0, 0, 0, 1, 1], dtype=int32)
>>> # use labels to cluster cities
>>> # London, Paris and Lyon have all-negative cats
>>> df['city'][lab == 0]
0    London
1     Paris
2      Lyon
Name: city, dtype: object
>>> Washington and Rome have all-positive cats
>>> df['city'][lab == 1]
3    Washington
4          Rome
Name: city, dtype: object
>>> 

Upvotes: 1

Related Questions