Deploy sklearn model with custom transformer

Question

I have a sklearn pipeline that has been defined in the following way:

from tools.transformers import MyTransformer

...

pipe = Pipeline([
    ('mytransformer', MyTransformer()),
    ('lm', LinearRegression())
])

...

The structure of my code is

src
├── __init__.py
├── train.py
└── tools
    └── transformers.py

I have trained my model and my pipeline is saved in a .joblib file. Now I want to use my model in another project. However, I need to move not only the .joblib file, but the whole tools/transformers.py structure. I think this is kind of difficult to maintain and hard to understand.

Is there an easier way to make the pipeline work without the need of moving the code around with the exact same structure?

Danylo Baibak · Accepted Answer

You need to create a separate project, for instance, internal_lib, and move there all custom logic that you use in the different projects. Then, you need to install your internal_lib as a part of your python environment (via pip or conda). After, you will be able to pickle a trained pipeline and reuse it in another project.

Technically it can be implemented as a private github repo and installed via pip. Here are couple of the links on how to implement: one, two.

Deploy sklearn model with custom transformer

Answers (2)

Related Questions