Reputation: 1265
I have a dataset of R-D curves such as the following.
(33.3987 34.7318 35.9673 36.8494 37.6992 38.422)
(3929.76 4946.93 6069.78 7243.61 8185.01 9387.84)
we have a 6D vector whose columns are corresponding to PSNR and bitrate. I try to cluster these vectors using K-Means clustering. But my question is how can I use these vectors as input to K-Means? do I need to enter 2D inputs for each column such as (33.3987,3929.76)
?
or do I have to put them beside each other?
(33.3987 34.7318 35.9673 36.8494 37.6992 38.422 3929.76 4946.93 6069.78 7243.61 8185.01 9387.84)
I am confused about that because I am not sure about the input of K-Means as a vector.
I used this to combine two arrays as input to K-Means:
psnr_bitrate=np.load(r'F:/RD_data_from_twitch_system/RD_data_from_twitch_system/bitrate_1080.npy')
bitrate=np.load(r'F:/RD_data_from_twitch_system/RD_data_from_twitch_system/psnr_1080.npy')#***
kmeans_input=np.array([psnr_bitrate],[bitrate])
and it produces this error:
Traceback (most recent call last):
File "<ipython-input-33-28c2bfac9deb>", line 2, in <module>
scaled_features = pd.DataFrame((kmeans_input))
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 497, in __init__
mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 190, in init_ndarray
values = _prep_ndarray(values, copy=copy)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 324, in _prep_ndarray
raise ValueError(f"Must pass 2-d input. shape={values.shape}")
ValueError: Must pass 2-d input. shape=(2, 71, 6)
Upvotes: 0
Views: 1293
Reputation: 19
You should create a list of the vectors. IE a numpy array of shape=(n_vectors, 6).
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[33.3987, 34.7318, 35.9673, 36.8494, 37.6992, 38.422],
[3929.76, 4946.93, 6069.78, 7243.61, 8185.01, 9387.84]]
kmeans = KMeans(n_clusters=3).fit(X)
Obviously you will need to change n_clusters to get good results. See https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html for more info.
Upvotes: 1