David Masip
David Masip

Reputation: 2531

Deploy sklearn model with custom transformer

I have a sklearn pipeline that has been defined in the following way:

from tools.transformers import MyTransformer

...

pipe = Pipeline([
    ('mytransformer', MyTransformer()),
    ('lm', LinearRegression())
])

...

The structure of my code is

src
├── __init__.py
├── train.py
└── tools
    └── transformers.py

I have trained my model and my pipeline is saved in a .joblib file. Now I want to use my model in another project. However, I need to move not only the .joblib file, but the whole tools/transformers.py structure. I think this is kind of difficult to maintain and hard to understand.

Is there an easier way to make the pipeline work without the need of moving the code around with the exact same structure?

Upvotes: 1

Views: 573

Answers (2)

crcastillo
crcastillo

Reputation: 96

You should be able to use cloudpickle to ensure your custom module (transformer.py) is also loaded when loading the pickle file.

import cloudpickle

cloudpickle.register_pickle_by_value(MyTransformer)
with open('./Pipe.cloudpkl', mode='wb') as file:
    cloudpickle.dump(
        obj=Pipe
        , file=file
    )

Upvotes: 0

Danylo Baibak
Danylo Baibak

Reputation: 2326

You need to create a separate project, for instance, internal_lib, and move there all custom logic that you use in the different projects. Then, you need to install your internal_lib as a part of your python environment (via pip or conda). After, you will be able to pickle a trained pipeline and reuse it in another project.

Technically it can be implemented as a private github repo and installed via pip. Here are couple of the links on how to implement: one, two.

Upvotes: 1

Related Questions