Reputation: 53
I need to normalize the eigenvectors for two different shape.After writing this code:
from sklearn import preprocessing
scaler = preprocessing.StandardScaler()
scaled_df1 = scaler.fit_transform(doc2Vec)
scaled_df2 = scaler.transform(new_vec)
doc2Vec = np.zeros((75000, 50)) ; new_vec = np.zeros((75000, 3)) .
The error is
File "D:/PycharmProjects/sentiment_analysis_text/Sentiment_analysis-master/project/Word2Vec.py", line 232, in combined_features scaled_df2 = scaler.transform(new_vec) File "D:\PycharmProjects\sentiment_analysis_text\venv\lib\site-packages\sklearn\preprocessing_data.py", line 806, in transform X -= self.mean_ ValueError: operands could not be broadcast together with shapes (75000,3) (50,) (75000,3)
Notice, I know I'm going to standardise the eigenvectors for the different dimensions, and that's exactly what I'm going to do.How do I fix the code ? Or, how should they be standardised?
Help, please
Upvotes: 1
Views: 3875
Reputation: 33147
The model is fitted on doc2Vec = np.zeros((75000, 50))
and then asked to transform new_vec = np.zeros((75000, 3))
.
The dimensionality is not the same, thus the error is raised.
To overcome this, @glemaitre's answer provides a "custom" Scaler.
Upvotes: 0
Reputation: 1003
You cannot use a StandardScaler
because X
given at transform
should have the same number of the column than the X
given at fit
.
Basically, at fit
, you learnt a vector of mean/std. dev. of dimension (50,) while you try to apply these statistics on your new X
with only 3 columns.
So if I understand correctly how you want to normalize, you can make your own scaler class:
import numpy as np
from sklearn.base import BaseEstimator
from sklearn.base import TransformerMixin
class MyScaler(TransformerMixin, BaseEstimator):
def fit(self, X, y=None):
self.means_ = X.mean(axis=0)
self.std_dev_ = X.std(axis=0)
return self
def transform(self, X, y=None):
return (X - self.means_[:X.shape[1]]) / self.std_dev_[:X.shape[1]]
X_train = np.random.randn(5, 5)
X_test = np.random.randn(5, 2)
scaler = MyScaler()
print(scaler.fit_transform(X_train))
print(scaler.transform(X_test))
and you will get something like:
[[-1.46691268 -1.45361873 -0.45377612 0.49119234 -1.08791771]
[-0.21413664 0.71465686 -0.29978242 0.58696079 -0.30286673]
[ 0.5123138 0.17096286 -0.34478627 -1.44547556 -0.51613175]
[ 1.54792261 -0.76139957 -0.86188385 1.24605112 0.04787813]
[-0.37918709 1.32939857 1.96022866 -0.8787287 1.85903806]]
[[ 0.38906146 2.27223431]
[-0.340497 -0.42958738]
[-0.30017852 1.84465534]
[ 0.79533469 4.49370725]
[-0.23766821 1.8216171 ]]
Upvotes: 2