hanzgs
hanzgs

Reputation: 1616

Creating pipeline in sklearn with custom functions?

How to create sklearn pipeline with custom functions? I have a two functions, one for cleaning data and second for building model.

def preprocess(df):
   ……………….
   # clean data
   return df_clean

def model(df_clean):
   …………………
   #split data train and test and build randomForest Model
   return model

So I use FunctionTransformer and created pipeline

from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import FunctionTransformer

pipe = Pipeline([("preprocess", FunctionTransformer(preprocess)),("model",FunctionTransformer(model))])

pred = pipe.predict_proba(new_test_data)
print(pred)

I know above is wrong, not sure how to work on, in the pipe I need to pass the training data first then, I have to pass new_test_data?

Upvotes: 5

Views: 4952

Answers (2)

Data_explorer
Data_explorer

Reputation: 11

A better and easy way to do this is using Kedro, it doesn't care about the object type and you can write any custom function for using inside a pipeline. You can use kedro.Pipeline to put all your functions in sequence and call them as you would do in sklearn pipeline. The syntaxes are little different and more flexible than sklearn.

You can learn more about kedro here or their official documentation.

Upvotes: 0

Arpit Sisodia
Arpit Sisodia

Reputation: 649

you need to create your own class that inherits BaseEstimator, TransformerMixin of sklearn.

then specify your function in fit/transform/fit_transform / predict/predict_prob etc functions of your own class.

Put customized functions in Sklearn pipeline

Upvotes: 4

Related Questions