DumbCoder
DumbCoder

Reputation: 445

Pass arguments to FunctionTransformer in Pipeline

I have been learning about sklearn preprocessing and pipelines and come across the concept of FunctionTransformer. I want to understand if one has to integrate it in a pipeline and pass arguments to a function which FunctionTransformer is referring to, how would that be done. Consider the example below, for simplicity, i have written a small function:

def return_selected_dataset(dataset, columns):
    return dataset[columns]

pipe = Pipeline([('Return_Col', FunctionTransformer(return_selected_dataset))])
pipe.fit_transform(dataset, columns = ['Col1', 'Col2'])

I am getting the following error: ValueError: Pipeline.fit does not accept the columns parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. `Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight)`.

How can I pass the value of columns to the function? Also, can someone suggest any book or website where I can study the sklearn pipelines and preprocessing in detail and how to customize these processes?

Upvotes: 6

Views: 4003

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

Example dataset:

import numpy as np
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline
import pandas as pd

X = pd.DataFrame({'Col1':[1,2],'Col2':[3,4],'Col3':[5,6]})

Your function:

def return_selected_dataset(dataset, columns):
    return dataset[columns]

Without the pipeline, it would be like:

FunctionTransformer(return_selected_dataset,
kw_args={'columns':['Col1','Col2']}).transform(X)

Note with pipeline, you can only pass parameters to each of your fit steps, see the help page:

**fit_paramsdict of string -> object Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

So I think what you can do is:

pipe = Pipeline([
('Return_Col',
FunctionTransformer(return_selected_dataset,
kw_args={'columns':['Col1','Col2']})
)
])

pipe.fit_transform(X)
 
   Col1  Col2
0     1     3
1     2     4

Upvotes: 8

Related Questions