Reputation: 1616
How to create sklearn pipeline with custom functions? I have a two functions, one for cleaning data and second for building model.
def preprocess(df):
……………….
# clean data
return df_clean
def model(df_clean):
…………………
#split data train and test and build randomForest Model
return model
So I use FunctionTransformer and created pipeline
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.preprocessing import FunctionTransformer
pipe = Pipeline([("preprocess", FunctionTransformer(preprocess)),("model",FunctionTransformer(model))])
pred = pipe.predict_proba(new_test_data)
print(pred)
I know above is wrong, not sure how to work on, in the pipe I need to pass the training data first then, I have to pass new_test_data?
Upvotes: 5
Views: 4952
Reputation: 11
A better and easy way to do this is using Kedro
, it doesn't care about the object type and you can write any custom function for using inside a pipeline. You can use kedro.Pipeline
to put all your functions in sequence and call them as you would do in sklearn pipeline. The syntaxes are little different and more flexible than sklearn.
You can learn more about kedro
here or their official documentation.
Upvotes: 0
Reputation: 649
you need to create your own class that inherits BaseEstimator, TransformerMixin of sklearn.
then specify your function in fit/transform/fit_transform / predict/predict_prob etc functions of your own class.
Put customized functions in Sklearn pipeline
Upvotes: 4