heisenberg737
heisenberg737

Reputation: 49

ValueError when fitting ColumnTransformer

I am trying to fit a ColumnTransformer on my Dataset which has 6 columns (labelled C1,C2,... C6). I wrote the following code to create my transformer.

dummyData = pd.DataFrame({
    'C1' : ['2017-01-01', '2017-01-02','2017-01-03','2017-01-04','2017-01-05'],
    'C2' : ['W1','W2','W3','W4','W5'],
    'C3' : [np.NaN, np.NaN, 213727, 213613, 217636],
    'C4' : [np.NaN,0,3,2.5,np.NaN],
    'C5' : [0,0,3,5.5,5.5],
    'C6' : [487.15,273.15,364.15,463.25,373.15]
})
preprocessor = ColumnTransformer(transformers = [
          ('missing_ind',MissingIndicator(), ['C3','C4']),
          ('impute_num',SimpleImputer(strategy='median'),['C3','C4','C5']),
          ('ordinalEncoder', OrdinalEncoder(), ['C2']),
          ('scaler', StandardScaler())
], remainder='passthrough')

precprocessor.fit_transform(dummyData) 

However I'm getting the following error.

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-17-84099700df4f> in <module>()
----> 1 preprocessor.fit(dummyData)

2 frames

/usr/local/lib/python3.6/dist-packages/sklearn/compose/_column_transformer.py in _validate_transformers(self)
    271             return
    272 
--> 273         names, transformers, _ = zip(*self.transformers)
    274 
    275         # validate names

ValueError: not enough values to unpack (expected 3, got 2)

I'm not sure what's causing this error and would appreciate help on this.

Upvotes: 1

Views: 770

Answers (1)

SirAchesis
SirAchesis

Reputation: 345

Reading the error code we can see two things:

  1. That the error stems from us only giving two values to a function that needs three
  2. We can see that the error happens when we're trying to fit the ColumnTransformer and more specifically when we're setting up the individual transformers.

Using that info and looking at this:

preprocessor = ColumnTransformer(transformers = [
          ('missing_ind',MissingIndicator(), ['C3','C4']),
          ('impute_num',SimpleImputer(strategy='median'),['C3','C4','C5']),
          ('ordinalEncoder', OrdinalEncoder(), ['C2']),
          ('scaler', StandardScaler())
], remainder='passthrough')

We can see that every other transformer is given 3 values (name,transformer,cols) except for the last scaler.

Specifying what cols you want the last scaler to affect, and therefore giving it the third value it needs, will get rid of the error.

Upvotes: 2

Related Questions