yoshcn
yoshcn

Reputation: 37

kubeflow deployment issue on macos m1 using rancher desktop

I am trying to install kubeflow from branch master from manifests, using the command

while ! kustomize build example | awk '!/well-defined/' | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

I am using kubernetes 1.24 from rancher desktop:

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-08-17T18:54:23Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.11+k3s1", GitCommit:"c14436a9ecfffb3be553a06bb0a4fac6122579ce", GitTreeState:"clean", BuildDate:"2023-03-10T21:47:44Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/arm64"}

and kustomize 5.0.0.1.

During the deployement I obtain an error:


NAMESPACE          NAME                                                     READY   STATUS             RESTARTS        AGE
kube-system        helm-install-traefik-crd-hs4r2                           0/1     Completed          0               43m
kube-system        helm-install-traefik-s2c7l                               0/1     Completed          2               43m
kube-system        traefik-64b96ccbcd-tjdz9                                 1/1     Running            1 (5m38s ago)   43m
auth               dex-8579644bbb-p5kc7                                     1/1     Running            1 (5m38s ago)   36m
istio-system       istiod-586fcd6677-nsfvh                                  1/1     Running            1 (5m38s ago)   38m
cert-manager       cert-manager-cainjector-d5dc6cd7f-qrjtt                  1/1     Running            1 (9m11s ago)   37m
kubeflow           metadata-envoy-deployment-76c587bd47-dpxv2               1/1     Running            1 (5m38s ago)   13m
kube-system        local-path-provisioner-687d6d7765-gqnlg                  1/1     Running            1 (5m38s ago)   43m
kube-system        svclb-traefik-1503cd1b-w69sd                             2/2     Running            2 (5m38s ago)   43m
kubeflow           kubeflow-pipelines-profile-controller-5dd5468d9b-nxv99   1/1     Running            0               13m
kube-system        coredns-7b5bbc6644-xd9xp                                 1/1     Running            1 (5m38s ago)   43m
kubeflow           kserve-controller-manager-7879bf6dd7-29bdj               2/2     Running            2 (5m38s ago)   13m
knative-eventing   eventing-controller-5b7bfc8895-vzb4x                     1/1     Running            1 (5m38s ago)   13m
knative-eventing   eventing-webhook-5896d776b-l4xb4                         1/1     Running            1 (5m38s ago)   13m
kubeflow           katib-controller-86d4d45478-pstv7                        1/1     Running            1 (5m38s ago)   13m
cert-manager       cert-manager-7475574-2w29b                               1/1     Running            1 (9m11s ago)   37m
cert-manager       cert-manager-webhook-6868bd8b7-lbvrx                     1/1     Running            1 (5m38s ago)   37m
kube-system        metrics-server-667586758d-59g4s                          1/1     Running            1 (5m38s ago)   43m
kubeflow           katib-db-manager-689cdf95c6-v7jl8                        1/1     Running            1 (7m47s ago)   13m
kubeflow           metacontroller-0                                         1/1     Running            0               12m
kubeflow           cache-server-86584db5d8-fvzq5                            2/2     Running            0               13m
kubeflow           ml-pipeline-persistenceagent-75bccd8b64-n2gfl            2/2     Running            0               13m
knative-serving    net-istio-webhook-6858cd8998-mznfm                       2/2     Running            5 (4m47s ago)   13m
istio-system       cluster-local-gateway-757849494c-cqv88                   1/1     Running            1 (5m38s ago)   13m
istio-system       authservice-0                                            1/1     Running            0               13m
istio-system       istio-ingressgateway-cf7bd56f-9lvmg                      1/1     Running            1 (5m38s ago)   38m
kubeflow           minio-6d6d45469f-8f7qt                                   2/2     Running            1 (5m38s ago)   13m
knative-serving    controller-657b7bb75c-gjxkm                              2/2     Running            4 (4m32s ago)   13m
knative-serving    webhook-76f9bc6584-kzm74                                 2/2     Running            5 (4m39s ago)   13m
knative-serving    domainmapping-webhook-f76bcd89f-qdzg7                    2/2     Running            5 (4m28s ago)   13m
knative-serving    domain-mapping-6c4878cc54-zvwz6                          2/2     Running            5 (4m26s ago)   13m
knative-serving    net-istio-controller-6cb499fccb-g7dvk                    2/2     Running            4 (4m33s ago)   13m
kubeflow           workflow-controller-78c979dc75-gl46c                     2/2     Running            4 (4m29s ago)   13m
kubeflow           katib-mysql-5bc98798b4-v5tbv                             1/1     Running            1               13m
kubeflow           ml-pipeline-scheduledworkflow-6dfcd5dd89-m4lmd           2/2     Running            1 (5m38s ago)   13m
kubeflow           ml-pipeline-viewer-crd-86cbc45d9b-8rrg8                  2/2     Running            4 (4m23s ago)   13m
knative-serving    autoscaler-5cc8b77f4d-ztbzd                              2/2     Running            3 (4m8s ago)    13m
knative-serving    activator-5bbf976855-979ch                               2/2     Running            3 (4m9s ago)    13m
kubeflow           katib-ui-b5d5cf978-djvs5                                 2/2     Running            5 (4m20s ago)   13m
kubeflow           mysql-6878bbff69-pzq2p                                   2/2     Running            0               13m
kubeflow           training-operator-7f768bbbdb-9cp57                       1/1     Running            2 (3m54s ago)   13m
kubeflow           metadata-writer-6c576c94b8-d7dhl                         2/2     Running            2 (3m8s ago)    13m
kubeflow           ml-pipeline-77d4d9974b-vx5sz                             2/2     Running            3 (2m12s ago)   13m
kubeflow           metadata-grpc-deployment-5c8599b99c-zp7qk                2/2     Running            5 (117s ago)    13m
kubeflow           ml-pipeline-visualizationserver-5577c64b45-d2v4b         2/2     Running            0               13m
kubeflow           admission-webhook-deployment-cb6db9648-78rtl             0/1     ImagePullBackOff   0               13m
kubeflow           kserve-models-web-app-f9c576856-88qdc                    1/2     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           centraldashboard-dd9c778b6-78snk                         1/2     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           ml-pipeline-ui-5ddb5b76d8-89hdf                          2/2     Running            7 (2m15s ago)   13m
kubeflow           jupyter-web-app-deployment-cc9cbc696-bvb48               1/2     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           volumes-web-app-deployment-7b998df674-765sm              1/2     ImagePullBackOff   0               13m
kubeflow           tensorboards-web-app-deployment-8474fd9569-4xnst         1/2     ImagePullBackOff   0               13m
kubeflow           notebook-controller-deployment-699589b4f9-bb6fd          1/2     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           profiles-deployment-74f656c59f-qbzlz                     1/3     ImagePullBackOff   1 (5m38s ago)   13m
kubeflow           tensorboard-controller-deployment-5655cc9dbb-5mvfg       2/3     ImagePullBackOff   0               13m

When I inspect the problematic pods: (for example tensorboard-controller-deployment-5655cc9dbb-5mvfg ) I obtained:

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  15m                    default-scheduler  0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Warning  FailedScheduling  8m10s                  default-scheduler  0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Normal   Scheduled         7m57s                  default-scheduler  Successfully assigned kubeflow/tensorboard-controller-deployment-5655cc9dbb-5mvfg to lima-rancher-desktop
  Warning  FailedScheduling  16m                    default-scheduler  0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Normal   Created           7m43s                  kubelet            Created container istio-init
  Normal   Pulled            7m43s                  kubelet            Container image "docker.io/istio/proxyv2:1.16.0" already present on machine
  Normal   Started           7m42s                  kubelet            Started container istio-init
  Normal   Pulling           7m28s                  kubelet            Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0"
  Normal   Pulled            7m8s                   kubelet            Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" in 19.515117093s
  Normal   Created           7m7s                   kubelet            Created container kube-rbac-proxy
  Normal   Created           7m6s                   kubelet            Created container istio-proxy
  Normal   Pulled            7m6s                   kubelet            Container image "docker.io/istio/proxyv2:1.16.0" already present on machine
  Normal   Started           7m6s                   kubelet            Started container kube-rbac-proxy
  Normal   Started           7m5s                   kubelet            Started container istio-proxy
  Warning  Unhealthy         7m2s (x2 over 7m3s)    kubelet            Readiness probe failed: Get "http://10.42.0.104:15021/healthz/ready": dial tcp 10.42.0.104:15021: connect: connection refused
  Normal   Pulling           6m20s (x3 over 7m39s)  kubelet            Pulling image "docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0"
  Warning  Failed            6m8s (x2 over 7m28s)   kubelet            Failed to pull image **"docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0": no match for platform in manifest: not found**
  Warning  Failed            6m8s (x2 over 7m28s)   kubelet            Error: ErrImagePull
  Warning  Failed            5m55s (x3 over 6m47s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff           81s (x20 over 6m47s)   kubelet            Back-off pulling image "docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc

It looks like the docker image registry is not found. Any idea how should I proceed ?

Upvotes: 1

Views: 307

Answers (2)

yoshcn
yoshcn

Reputation: 37

Yes,

after further research I found that the image I need, is not available for macos m1. I need a full virtualization in order to make it work.

Upvotes: 1

glv
glv

Reputation: 1187

The error is speaking; it would seem that the URL associated with the IMAGE field of the deployment in question doesn't work.

Try changing the image from docker.io/... to kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0

Behind the scenes Kubernetes runs docker pull ... pointing to docker-hub.

Upvotes: 0

Related Questions