Reputation: 2255
I have Windows and WSL2 in a network that enforces certificates and SSL aggressively. To use git
, pip
or mvn
, I have to make user account changes to put the certificate file in the environment and trust some pip sites. The central problem is that we have a self-signed certificate in our chain and it drives the pip and git servers crazy.
This causes a weird problem with Windows Docker-Desktop. When mlflow tries to launch docker, I hit a fail when the build tries to use git, pip, or mvn.
I run
$ mlflow models build-docker -m "runs:/25432534435423/Clustering" -n myco_ml/myco_ml_cluster:20241007 --enable-mlserver
The first error is a git error:
Registered model 'pj-cluster' already exists. Creating a new version of this model...
Created version '16' of model 'pj-cluster'.
model_uri: runs:/bd938a4092d14ec4b61a7db95f549459/Clustering_Model
ml_info.model_uri: runs:/bd938a4092d14ec4b61a7db95f549459/Clustering_Model
2024/10/09 10:32:42 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'
2024/10/09 10:32:42 INFO mlflow.pyfunc.backend: Building docker image with name hrb_mll_cluster
[+] Building 2.8s (8/21) docker:desktop-linux => [internal] load build definition from Dockerfile 0.1s => => transferring dockerfile: 2.58kB 0.0s => [internal] load metadata for docker.io/library/ubuntu:20.04 1.9s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load build context 0.0s => => transferring context: 3.27kB 0.0s => [ 1/17] FROM docker.io/library/ubuntu:20.04@sha256:6d8d9799fe6ab3221965efac00b4c34a2bcc102c086a58dff9e19a08b913c7ef 0.0s => => resolve docker.io/library/ubuntu:20.04@sha256:6d8d9799fe6ab3221965efac00b4c34a2bcc102c086a58dff9e19a08b913c7ef 0.0s => CACHED [ 2/17] RUN apt-get -y update && DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y --no-install-recommends wget curl nginx ca-certificates bzip2 build-essential cmake git-core 0.0s => CACHED [ 3/17] RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncursesw5-dev xz- 0.0s => ERROR [ 4/17] RUN git clone --depth 1 --branch $(git ls-remote --tags --sort=v:refname https://github.com/pyenv/pyenv.git | grep -o -E 'v[1-9]+(\.[1-9]+)+$' | tail -1) https://github 0.7s ------
> [ 4/17] RUN git clone --depth 1 --branch $(git ls-remote --tags --sort=v:refname https://github.com/pyenv/pyenv.git | grep -o -E 'v[1-9]+(\.[1-9]+)+$' | tail -1) https://github.com/pyenv/pyenv.git /root/.pyenv:
0.655 fatal: unable to access 'https://github.com/pyenv/pyenv.git/': server certificate verification failed. CAfile: none CRLfile: none
0.662 fatal: repository '/root/.pyenv' does not exist
------
Dockerfile:10
--------------------
9 | libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
10 | >>> RUN git clone \
11 | >>> --depth 1 \
12 | >>> --branch $(git ls-remote --tags --sort=v:refname https://github.com/pyenv/pyenv.git | grep -o -E 'v[1-9]+(\.[1-9]+)+$' | tail -1) \
13 | >>> https://github.com/pyenv/pyenv.git /root/.pyenv
14 | ENV PYENV_ROOT="/root/.pyenv"
--------------------
ERROR: failed to solve: process "/bin/sh -c git clone --depth 1 --branch $(git ls-remote --tags --sort=v:refname https://github.com/pyenv/pyenv.git | grep -o -E 'v[1-9]+(\\.[1-9]+)+$' | tail -1)
https://github.com/pyenv/pyenv.git /root/.pyenv" did not complete successfully: exit code: 128
View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/nizz1fi7b3qbmu2dlf87mny06
Traceback (most recent call last):
File "C:\Users\A1146108\tmp\mlflow-01\cluster-02.py", line 56, in <module>
mlflow.models.build_docker(
File "C:\Users\A1146108\venv-mlflow-3.9\lib\site-packages\mlflow\models\python_api.py", line 86, in build_docker
get_flavor_backend(model_uri, docker_build=True, env_manager=env_manager).build_image(
File "C:\Users\A1146108\venv-mlflow-3.9\lib\site-packages\mlflow\pyfunc\backend.py", line 369, in build_image
docker_utils.build_image_from_context(context_dir=cwd, image_name=image_name)
File "C:\Users\A1146108\venv-mlflow-3.9\lib\site-packages\mlflow\models\docker_utils.py", line 230, in build_image_from_context
raise RuntimeError("Docker build failed.")
RuntimeError: Docker build failed.
I went into the source code for mlflow and found that git
call and hacked it so that git
ignores certificates. Embarrassing, but workable. After that, the git
call succeeds but eventually build fails at the calls to pip
. I go hack the source code again and ignore pip certificates. Again, embarrassing, but works. Then "mvn", and hack that. I expect that if I keep hacking the mlflow source code to disable SSL checks at every step, it will eventually succeed.
This is ridiculous, though. This cannot be what mlflow
or Docker intend. Why don't other people crash into this? It seems, well, awful.
What am I missing?
I know Docker can build within WSL2 itself, if I uninstall Windows Docker-Desktop entirely. I followed these instructions: https://dataedo.com/docs/installing-docker-on-windows-via-wsl and the Docker build succeeded. I'm pretty sure it is using my user environment, which has settings in .bashrc
, and .config\git
and .config\pip
.
I'd like to cut mlflow's docker build code out of the workflow, at least then I'd see how to more easily customize the Docker file.
Upvotes: 0
Views: 61