user9903583
user9903583

Reputation: 53

StandardScaler ValueError: operands could not be broadcast together with shapes (75000,3) (50,) (75000,3)

I need to normalize the eigenvectors for two different shape.After writing this code:

    from sklearn import preprocessing

    scaler = preprocessing.StandardScaler()
    scaled_df1 = scaler.fit_transform(doc2Vec)
    scaled_df2 = scaler.transform(new_vec)

doc2Vec = np.zeros((75000, 50)) ; new_vec = np.zeros((75000, 3)) .

The error is

File "D:/PycharmProjects/sentiment_analysis_text/Sentiment_analysis-master/project/Word2Vec.py", line 232, in combined_features scaled_df2 = scaler.transform(new_vec) File "D:\PycharmProjects\sentiment_analysis_text\venv\lib\site-packages\sklearn\preprocessing_data.py", line 806, in transform X -= self.mean_ ValueError: operands could not be broadcast together with shapes (75000,3) (50,) (75000,3)

Notice, I know I'm going to standardise the eigenvectors for the different dimensions, and that's exactly what I'm going to do.How do I fix the code ? Or, how should they be standardised?

Help, please

Upvotes: 1

Views: 3875

Answers (2)

seralouk
seralouk

Reputation: 33147

The model is fitted on doc2Vec = np.zeros((75000, 50)) and then asked to transform new_vec = np.zeros((75000, 3)).

The dimensionality is not the same, thus the error is raised.

To overcome this, @glemaitre's answer provides a "custom" Scaler.

Upvotes: 0

glemaitre
glemaitre

Reputation: 1003

You cannot use a StandardScaler because X given at transform should have the same number of the column than the X given at fit.

Basically, at fit, you learnt a vector of mean/std. dev. of dimension (50,) while you try to apply these statistics on your new X with only 3 columns.

So if I understand correctly how you want to normalize, you can make your own scaler class:

import numpy as np

from sklearn.base import BaseEstimator
from sklearn.base import TransformerMixin


class MyScaler(TransformerMixin, BaseEstimator):

    def fit(self, X, y=None):
        self.means_ = X.mean(axis=0)
        self.std_dev_ = X.std(axis=0)
        return self

    def transform(self, X, y=None):
        return (X - self.means_[:X.shape[1]]) / self.std_dev_[:X.shape[1]]


X_train = np.random.randn(5, 5)
X_test = np.random.randn(5, 2)

scaler = MyScaler()
print(scaler.fit_transform(X_train))
print(scaler.transform(X_test))

and you will get something like:

[[-1.46691268 -1.45361873 -0.45377612  0.49119234 -1.08791771]
 [-0.21413664  0.71465686 -0.29978242  0.58696079 -0.30286673]
 [ 0.5123138   0.17096286 -0.34478627 -1.44547556 -0.51613175]
 [ 1.54792261 -0.76139957 -0.86188385  1.24605112  0.04787813]
 [-0.37918709  1.32939857  1.96022866 -0.8787287   1.85903806]]

[[ 0.38906146  2.27223431]
 [-0.340497   -0.42958738]
 [-0.30017852  1.84465534]
 [ 0.79533469  4.49370725]
 [-0.23766821  1.8216171 ]]

Upvotes: 2

Related Questions