Reputation: 865
I want to build a sklearn Pipeline (part of a further larger Pipeline), which :
I used this pipeline example :
and this example for custom TranformerMixin :
I get an error at step 4 (no error if I comment step 4) :
AttributeError Traceback (most recent call last) in () ----> 1 X_train_transformed = pipe.fit_transform(X_train) .... AttributeError: 'numpy.ndarray' object has no attribute 'fit'
My code :
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.decomposition import TruncatedSVD
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.compose import ColumnTransformer
# does nothing, but is here to collect numerical columns
class nothing(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
return X
class Aggregator(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
X = pd.DataFrame(X)
X = X.rename(columns = {0 :'InvoiceNo', 1 : 'amount', 2:'Quantity',
3:'UnitPrice',4:'CustomerID' })
X['InvoiceNo'] = X['InvoiceNo'].astype('int')
X['Quantity'] = X['Quantity'].astype('float64')
X['UnitPrice'] = X['UnitPrice'].astype('float64')
aggregations = dict()
for col in range(5, X.shape[1]-1) :
aggregations[col] = 'max'
aggregations.update({ 'CustomerID' : 'first',
'amount' : "sum",'Quantity' : 'mean', 'UnitPrice' : 'mean'})
# aggregating all basket lines
result = X.groupby('InvoiceNo').agg(aggregations)
# add number of lines in the basket
result['lines_nb'] = X.groupby('InvoiceNo').size()
return result
numeric_features = ['InvoiceNo','amount', 'Quantity', 'UnitPrice',
'CustomerID']
numeric_transformer = Pipeline(steps=[('nothing', nothing())])
categorical_features = ['StockCode', 'Country']
preprocessor = ColumnTransformer(
[
# 'num' transformer does nothing, but is here to
# collect numerical columns
('num', numeric_transformer ,numeric_features ),
('cat', Pipeline([
('onehot', OneHotEncoder(handle_unknown='ignore')),
('best', TruncatedSVD(n_components=100)),
]), categorical_features)
]
)
# edit with Artem solution
# aggregator = ('agg', Aggregator())
pipe = Pipeline(steps=[
('preprocessor', preprocessor),
# edit with Artem solution
# ('aggregator', aggregator),
('aggregator', Aggregator())
])
X_train_transformed = pipe.fit_transform(X_train)
Upvotes: 2
Views: 2825
Reputation: 1415
Pipeline steps are in from ('name', Class), but original task had essentially:
aggregator = ('agg', Aggregator())`
pipe = Pipeline(steps=[
('preprocessor', preprocessor),
('aggregator', aggregator),
])
which made it ('aggregator', ('agg', Aggregator()))
Upvotes: 1