Ben
Ben

Reputation: 1522

How to specify operations in a Pipeline?

I'm about to get used to Pipelines so I first have to learn/understand them. I'm trying to copy the procedure from here but I'm not able to apply it to my case.

There, they write:

# create feature union
features = []
features.append(('pca', PCA(n_components=3)))
features.append(('select_best', SelectKBest(k=6)))
feature_union = FeatureUnion(features)
# create pipeline
estimators = []
estimators.append(('feature_union', feature_union))
estimators.append(('logistic', LogisticRegression()))
model = Pipeline(estimators)
# evaluate pipeline
seed = 7
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(model, X, Y, cv=kfold)

In my case I would like to do the following:

estimators = []
estimators.append('standardize', StandardScaler().fit_transform())
prepare_data = Pipeline(estimators)

But when using fit_transform() I get the error fit_transform() missing 1 required positional argument: 'X'. But how can I use this function of StandardScaler() inside a pipeline?

Upvotes: 0

Views: 41

Answers (1)

Guillem
Guillem

Reputation: 2647

You should call fit_transform to the Pipeline itself.

X = ...
estimators = []
estimators.append('standardize', StandardScaler())
prepare_data = Pipeline(estimators)
X_scaled = prepare_data.fit_transform(X)

Upvotes: 1

Related Questions