Reputation: 1010
I am trying to build an API using an MLflow model.
the funny thing is it works from one location on my PC and not from another. So, the reason for doing I wanted to change my repo etc.
So, the simple code of
from mlflow.pyfunc import load_model
MODEL_ARTIFACT_PATH = "./model/model_name/"
MODEL = load_model(MODEL_ARTIFACT_PATH)
now fails with
ERROR: Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 540, in lifespan
async for item in self.lifespan_context(app):
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 481, in default_lifespan
await self.startup()
File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 516, in startup
await handler()
File "/code/./app/main.py", line 32, in startup_load_model
MODEL = load_model(MODEL_ARTIFACT_PATH)
File "/usr/local/lib/python3.8/dist-packages/mlflow/pyfunc/__init__.py", line 733, in load_model
model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
File "/usr/local/lib/python3.8/dist-packages/mlflow/spark.py", line 737, in _load_pyfunc
return _PyFuncModelWrapper(spark, _load_model(model_uri=path))
File "/usr/local/lib/python3.8/dist-packages/mlflow/spark.py", line 656, in _load_model
return PipelineModel.load(model_uri)
File "/usr/local/lib/python3.8/dist-packages/pyspark/ml/util.py", line 332, in load
return cls.read().load(path)
File "/usr/local/lib/python3.8/dist-packages/pyspark/ml/pipeline.py", line 258, in load
return JavaMLReader(self.cls).load(path)
File "/usr/local/lib/python3.8/dist-packages/pyspark/ml/util.py", line 282, in load
java_obj = self._jread.load(path)
File "/usr/local/lib/python3.8/dist-packages/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/utils.py", line 117, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.
The model artifacts are already downloaded to the folder /model folder which has the following structure.
the load model call is in the main.py file As I mentioned it works from another directory, but there is no reference to any absolute paths. Also, I have made sure that my package references are identical. e,g I have pinned them all down
# Model
mlflow==1.25.1
protobuf==3.20.1
pyspark==3.2.1
scipy==1.6.2
six==1.15.0
also, the same docker file is used both places, which among other things, makes sure that the final resulting folder structure is the same
......other stuffs
COPY ./app /code/app
COPY ./model /code/model
what can explain it throwing this exception whereas in another location (on my PC), it works (same model artifacts) ?
Since it uses load_model function, it should be able to read the parquet files ?
Any question and I can explain.
EDIT1: I have debugged this a little more in the docker container and it seems the parquet files in the itemFactors folder (listed in my screenshot above) are not getting copied over to my image , even though I have the copy command to copy all files under the model folder. It is copying the _started , _committed and _SUCCESS files, just not the parquet files. Anyone knows why would that be? I DO NOT have a .dockerignore file. Why are those files ignored while copying?
Upvotes: 3
Views: 1434
Reputation: 1010
I found the problem. Like I wrote in the EDIT1 of my post, with further observations, the parquet files were missing in the docker container. That was strange because I was copying the entire folder in my Dockerfile.
I then realized that I was hitting this problem mentioned here. File paths exceeding 260 characters, silently fail and do not get copied over to the docker container. This was really frustrating because nothing failed during build and then during run, it gave me that cryptic error of "unable to infer schema for parquet", essentially because the parquet files were not copied over during docker build.
Upvotes: 1