david
david

Reputation: 1265

K-Means clustering with 6d vectors

I have a dataset of R-D curves such as the following.

(33.3987 34.7318 35.9673 36.8494 37.6992 38.422)

(3929.76 4946.93 6069.78 7243.61 8185.01 9387.84)

we have a 6D vector whose columns are corresponding to PSNR and bitrate. I try to cluster these vectors using K-Means clustering. But my question is how can I use these vectors as input to K-Means? do I need to enter 2D inputs for each column such as (33.3987,3929.76)? or do I have to put them beside each other? (33.3987 34.7318 35.9673 36.8494 37.6992 38.422 3929.76 4946.93 6069.78 7243.61 8185.01 9387.84) I am confused about that because I am not sure about the input of K-Means as a vector. I used this to combine two arrays as input to K-Means:

psnr_bitrate=np.load(r'F:/RD_data_from_twitch_system/RD_data_from_twitch_system/bitrate_1080.npy')
bitrate=np.load(r'F:/RD_data_from_twitch_system/RD_data_from_twitch_system/psnr_1080.npy')#***
kmeans_input=np.array([psnr_bitrate],[bitrate])

and it produces this error:

Traceback (most recent call last):

  File "<ipython-input-33-28c2bfac9deb>", line 2, in <module>
    scaled_features = pd.DataFrame((kmeans_input))

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 497, in __init__
    mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 190, in init_ndarray
    values = _prep_ndarray(values, copy=copy)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 324, in _prep_ndarray
    raise ValueError(f"Must pass 2-d input. shape={values.shape}")

ValueError: Must pass 2-d input. shape=(2, 71, 6)

Upvotes: 0

Views: 1293

Answers (1)

Sefton de Pledge
Sefton de Pledge

Reputation: 19

You should create a list of the vectors. IE a numpy array of shape=(n_vectors, 6).

from sklearn.cluster import KMeans
import numpy as np

X = np.array([[33.3987, 34.7318, 35.9673, 36.8494, 37.6992, 38.422],
              [3929.76, 4946.93, 6069.78, 7243.61, 8185.01, 9387.84]]

kmeans = KMeans(n_clusters=3).fit(X)

Obviously you will need to change n_clusters to get good results. See https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html for more info.

Upvotes: 1

Related Questions