"ValueError: all the input arrays must have same number of dimensions" Error in sklearn pipeline

Question

I am building a machine learning pipeline using sklearn pipeline. In prerprocessing step, I am trying to do two different treatments to two different sting variables 1) One Hot Encoding on BusinessType 2) Mean Encoding on AreaCode as below:

preprocesses_pipeline = make_pipeline (
    FeatureUnion (transformer_list = [
        ("text_features1",  make_pipeline(
            FunctionTransformer(getBusinessTypeCol, validate=False), CustomOHE()
        )),
        ("text_features2",  make_pipeline(
            FunctionTransformer(getAreaCodeCol, validate=False)
        ))
    ])
)

preprocesses_pipeline.fit_transform(trainDF[X_cols])

With TransformerMixin classes defined as:

class MeanEncoding(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        tmp = X['AreaCode1'].map(X.groupby('AreaCode1')['isFail'].mean())
        return tmp.values

class CustomOHE(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        tmp = pd.get_dummies(X)
        return tmp.values

and FunctionTransformer functions returningr the desied fields

def getBusinessTypeCol(df):
    return df['BusinessType']

def getAreaCodeCol(df):
    return df[['AreaCode1','isFail']]

Now when I un the above pipeline, it generates following error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in ()
     15 )
     16 
---> 17 preprocesses_pipeline.fit_transform(trainDF[X_cols])

~\Anaconda3\lib\site-packages\sklearn\pipeline.py in fit_transform(self, X, y, **fit_params)
    281         Xt, fit_params = self._fit(X, y, **fit_params)
    282         if hasattr(last_step, 'fit_transform'):
--> 283             return last_step.fit_transform(Xt, y, **fit_params)
    284         elif last_step is None:
    285             return Xt

~\Anaconda3\lib\site-packages\sklearn\pipeline.py in fit_transform(self, X, y, **fit_params)
    747             Xs = sparse.hstack(Xs).tocsr()
    748         else:
--> 749             Xs = np.hstack(Xs)
    750         return Xs
    751 

~\Anaconda3\lib\site-packages
umpy\core\shape_base.py in hstack(tup)
    286         return _nx.concatenate(arrs, 0)
    287     else:
--> 288         return _nx.concatenate(arrs, 1)
    289 
    290 

ValueError: all the input arrays must have same number of dimensions

It seems like error is happening on line having "MeanEncoding" in pipeline as removing it makes the pipeline work fine. Not sure what exactly is wrong with it. Need help.

"ValueError: all the input arrays must have same number of dimensions" Error in sklearn pipeline

Answers (1)

Related Questions

&quot;ValueError: all the input arrays must have same number of dimensions&quot; Error in sklearn pipeline

Answers (1)

Related Questions

"ValueError: all the input arrays must have same number of dimensions" Error in sklearn pipeline