hkj447
hkj447

Reputation: 737

Docker - how to use a saved file created in the container

Objective: train a machine learning model in a .py (train_model.py) file, save the model to a .joblib file (Inference_xgb.joblib), load the model into another .py (Inference.py) file, use the model to make predictions and save the output.

Issue: Inference.py cannot find the Inference_xgb.joblib file.

Relevant code snippets:

Training (train_model.py):

#!/usr/bin/python3

import pandas as pd
from xgboost import XGBClassifier
from joblib import dump

def train():
    # load in and read training data
    training = './train.csv'
    data_train = pd.read_csv(training)
    label = data_train['2020 Failure'] # what we want to predict
    features = data_train.drop(['2020 Failure', 'FACILITYID'], axis =1, inplace=False) # what we train on the model to learn
    features = features.drop('Unnamed: 0', axis=1)
    x_train = features
    y_train = label

    # XGBoost model training
    xgb_model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
    xgb_model.fit(x_train, y_train)
    # save model
    dump(xgb_model, 'Inference_xgb.joblib')

if __name__== '__main__':
    train()

Testing (Inference.py):

#!/usr/bin/python3

import pandas as pd
from joblib import load
from sklearn.metrics import confusion_matrix
import os

def inference():
    # load and read in test data
    testing = './test.csv'
    data_test = pd.read_csv(testing)

    label = data_test['2020 Failure'] # what we want to predict
    features = data_test.drop(['2020 Failure', 'FACILITYID'], axis =1 ) # what we train on the model to learn
    features = features.drop('Unnamed: 0', axis=1)
    IDS = data_test['FACILITYID']
    x_test = features
    y_test = label

    # run model
    xgb_model = load('Inference_xgb.joblib')
    y_label = xgb_model.predict(x_test)
    cm = confusion_matrix(y_test,y_label)
    print("Confusion Matrix: ")
    print(cm)

    # write results
    dirpath = os.getcwd()
    print('CURRENT PATH: ', dirpath)
    output_path = os.path.join(dirpath, 'output.csv')
    output_df = pd.DataFrame(y_label, columns=['Prediction'])
    output_df.insert(0, "FACILITYID", IDS.values)
    output_df.to_csv(output_path)
    print('OUTPUT DF')
    print(output_df)

if __name__ == "__main__":
    inference()

Dockerfile:

FROM jupyter/scipy-notebook 

RUN pip install joblib
RUN pip install xgboost==1.5.0

USER root

WORKDIR /scaleable-model

COPY train.csv ./train.csv
COPY test.csv ./test.csv

COPY train_model.py ./train_model.py
COPY inference.py ./inference.py

RUN python3 train_model.py

Comments, observations, and what I've tried:

I've noticed that removing WORKDIR /scaleable-model fixes the issue, but I want to keep the WORKDIR to /scaleable-model so I can mount the .csv output to my host machine.

I am running docker build in the scaleable-model directory on my host machine. That is, I cd to /home/user/pathto/scaleable-model and run docker build -t scaleable-model -f Dockerfile .

I then call docker run and specify I want to call Inference.py, this is how the error is generated.

I've tried hardcoded paths as well but this did not help. I also created a Inference_xgb.joblib on my host machine in the same directory where I am building the container, but this did nothing either.

I suspect that either:

To quote Michael Burry, "I guess when someone's wrong, they never know how". I'd like to try to understand the how here.

EDIT: Checking the contents of the container, the file (Inference_xgb.joblib) IS being created in the directory that I want (/scaleable-model). Therefore, it must be an issue with Inference.py` not detecting the file for some reason.

Upvotes: 1

Views: 603

Answers (1)

krskara
krskara

Reputation: 319

To verify if the model file is being created in the container, you can -

  • Create a container and start a bash terminal

    docker run -it <image_name> bash

  • Check the current directory - this should be scalable-model

    pwd

  • List the contents of the directory - this should show the model file

    ls

Upvotes: 1

Related Questions