Reputation: 737
Objective: train a machine learning model in a .py
(train_model.py
) file, save the model to a .joblib
file (Inference_xgb.joblib
), load the model into another .py
(Inference.py
) file, use the model to make predictions and save the output.
Issue: Inference.py
cannot find the Inference_xgb.joblib
file.
Relevant code snippets:
Training (train_model.py
):
#!/usr/bin/python3
import pandas as pd
from xgboost import XGBClassifier
from joblib import dump
def train():
# load in and read training data
training = './train.csv'
data_train = pd.read_csv(training)
label = data_train['2020 Failure'] # what we want to predict
features = data_train.drop(['2020 Failure', 'FACILITYID'], axis =1, inplace=False) # what we train on the model to learn
features = features.drop('Unnamed: 0', axis=1)
x_train = features
y_train = label
# XGBoost model training
xgb_model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
xgb_model.fit(x_train, y_train)
# save model
dump(xgb_model, 'Inference_xgb.joblib')
if __name__== '__main__':
train()
Testing (Inference.py
):
#!/usr/bin/python3
import pandas as pd
from joblib import load
from sklearn.metrics import confusion_matrix
import os
def inference():
# load and read in test data
testing = './test.csv'
data_test = pd.read_csv(testing)
label = data_test['2020 Failure'] # what we want to predict
features = data_test.drop(['2020 Failure', 'FACILITYID'], axis =1 ) # what we train on the model to learn
features = features.drop('Unnamed: 0', axis=1)
IDS = data_test['FACILITYID']
x_test = features
y_test = label
# run model
xgb_model = load('Inference_xgb.joblib')
y_label = xgb_model.predict(x_test)
cm = confusion_matrix(y_test,y_label)
print("Confusion Matrix: ")
print(cm)
# write results
dirpath = os.getcwd()
print('CURRENT PATH: ', dirpath)
output_path = os.path.join(dirpath, 'output.csv')
output_df = pd.DataFrame(y_label, columns=['Prediction'])
output_df.insert(0, "FACILITYID", IDS.values)
output_df.to_csv(output_path)
print('OUTPUT DF')
print(output_df)
if __name__ == "__main__":
inference()
Dockerfile:
FROM jupyter/scipy-notebook
RUN pip install joblib
RUN pip install xgboost==1.5.0
USER root
WORKDIR /scaleable-model
COPY train.csv ./train.csv
COPY test.csv ./test.csv
COPY train_model.py ./train_model.py
COPY inference.py ./inference.py
RUN python3 train_model.py
Comments, observations, and what I've tried:
I've noticed that removing WORKDIR /scaleable-model
fixes the issue, but I want to keep the WORKDIR
to /scaleable-model
so I can mount the .csv
output to my host machine.
I am running docker build
in the scaleable-model
directory on my host machine. That is, I cd to /home/user/pathto/scaleable-model
and run docker build -t scaleable-model -f Dockerfile .
I then call docker run
and specify I want to call Inference.py
, this is how the error is generated.
I've tried hardcoded paths as well but this did not help. I also created a Inference_xgb.joblib
on my host machine in the same directory where I am building the container, but this did nothing either.
I suspect that either:
Inference_xgb.joblib
file is not being created properly in the containerInference.py
cannot find the file.To quote Michael Burry, "I guess when someone's wrong, they never know how". I'd like to try to understand the how here.
EDIT:
Checking the contents of the container, the file (Inference_xgb.joblib) IS being created in the directory that I want (
/scaleable-model). Therefore, it must be an issue with
Inference.py` not detecting the file for some reason.
Upvotes: 1
Views: 603
Reputation: 319
To verify if the model file is being created in the container, you can -
Create a container and start a bash terminal
docker run -it <image_name> bash
Check the current directory - this should be scalable-model
pwd
List the contents of the directory - this should show the model file
ls
Upvotes: 1