Reputation: 37
I am trying to install kubeflow from branch master from manifests, using the command
while ! kustomize build example | awk '!/well-defined/' | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
I am using kubernetes 1.24 from rancher desktop:
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4", GitCommit:"95ee5ab382d64cfe6c28967f36b53970b8374491", GitTreeState:"clean", BuildDate:"2022-08-17T18:54:23Z", GoVersion:"go1.18.5", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.11+k3s1", GitCommit:"c14436a9ecfffb3be553a06bb0a4fac6122579ce", GitTreeState:"clean", BuildDate:"2023-03-10T21:47:44Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/arm64"}
and kustomize 5.0.0.1.
During the deployement I obtain an error:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system helm-install-traefik-crd-hs4r2 0/1 Completed 0 43m
kube-system helm-install-traefik-s2c7l 0/1 Completed 2 43m
kube-system traefik-64b96ccbcd-tjdz9 1/1 Running 1 (5m38s ago) 43m
auth dex-8579644bbb-p5kc7 1/1 Running 1 (5m38s ago) 36m
istio-system istiod-586fcd6677-nsfvh 1/1 Running 1 (5m38s ago) 38m
cert-manager cert-manager-cainjector-d5dc6cd7f-qrjtt 1/1 Running 1 (9m11s ago) 37m
kubeflow metadata-envoy-deployment-76c587bd47-dpxv2 1/1 Running 1 (5m38s ago) 13m
kube-system local-path-provisioner-687d6d7765-gqnlg 1/1 Running 1 (5m38s ago) 43m
kube-system svclb-traefik-1503cd1b-w69sd 2/2 Running 2 (5m38s ago) 43m
kubeflow kubeflow-pipelines-profile-controller-5dd5468d9b-nxv99 1/1 Running 0 13m
kube-system coredns-7b5bbc6644-xd9xp 1/1 Running 1 (5m38s ago) 43m
kubeflow kserve-controller-manager-7879bf6dd7-29bdj 2/2 Running 2 (5m38s ago) 13m
knative-eventing eventing-controller-5b7bfc8895-vzb4x 1/1 Running 1 (5m38s ago) 13m
knative-eventing eventing-webhook-5896d776b-l4xb4 1/1 Running 1 (5m38s ago) 13m
kubeflow katib-controller-86d4d45478-pstv7 1/1 Running 1 (5m38s ago) 13m
cert-manager cert-manager-7475574-2w29b 1/1 Running 1 (9m11s ago) 37m
cert-manager cert-manager-webhook-6868bd8b7-lbvrx 1/1 Running 1 (5m38s ago) 37m
kube-system metrics-server-667586758d-59g4s 1/1 Running 1 (5m38s ago) 43m
kubeflow katib-db-manager-689cdf95c6-v7jl8 1/1 Running 1 (7m47s ago) 13m
kubeflow metacontroller-0 1/1 Running 0 12m
kubeflow cache-server-86584db5d8-fvzq5 2/2 Running 0 13m
kubeflow ml-pipeline-persistenceagent-75bccd8b64-n2gfl 2/2 Running 0 13m
knative-serving net-istio-webhook-6858cd8998-mznfm 2/2 Running 5 (4m47s ago) 13m
istio-system cluster-local-gateway-757849494c-cqv88 1/1 Running 1 (5m38s ago) 13m
istio-system authservice-0 1/1 Running 0 13m
istio-system istio-ingressgateway-cf7bd56f-9lvmg 1/1 Running 1 (5m38s ago) 38m
kubeflow minio-6d6d45469f-8f7qt 2/2 Running 1 (5m38s ago) 13m
knative-serving controller-657b7bb75c-gjxkm 2/2 Running 4 (4m32s ago) 13m
knative-serving webhook-76f9bc6584-kzm74 2/2 Running 5 (4m39s ago) 13m
knative-serving domainmapping-webhook-f76bcd89f-qdzg7 2/2 Running 5 (4m28s ago) 13m
knative-serving domain-mapping-6c4878cc54-zvwz6 2/2 Running 5 (4m26s ago) 13m
knative-serving net-istio-controller-6cb499fccb-g7dvk 2/2 Running 4 (4m33s ago) 13m
kubeflow workflow-controller-78c979dc75-gl46c 2/2 Running 4 (4m29s ago) 13m
kubeflow katib-mysql-5bc98798b4-v5tbv 1/1 Running 1 13m
kubeflow ml-pipeline-scheduledworkflow-6dfcd5dd89-m4lmd 2/2 Running 1 (5m38s ago) 13m
kubeflow ml-pipeline-viewer-crd-86cbc45d9b-8rrg8 2/2 Running 4 (4m23s ago) 13m
knative-serving autoscaler-5cc8b77f4d-ztbzd 2/2 Running 3 (4m8s ago) 13m
knative-serving activator-5bbf976855-979ch 2/2 Running 3 (4m9s ago) 13m
kubeflow katib-ui-b5d5cf978-djvs5 2/2 Running 5 (4m20s ago) 13m
kubeflow mysql-6878bbff69-pzq2p 2/2 Running 0 13m
kubeflow training-operator-7f768bbbdb-9cp57 1/1 Running 2 (3m54s ago) 13m
kubeflow metadata-writer-6c576c94b8-d7dhl 2/2 Running 2 (3m8s ago) 13m
kubeflow ml-pipeline-77d4d9974b-vx5sz 2/2 Running 3 (2m12s ago) 13m
kubeflow metadata-grpc-deployment-5c8599b99c-zp7qk 2/2 Running 5 (117s ago) 13m
kubeflow ml-pipeline-visualizationserver-5577c64b45-d2v4b 2/2 Running 0 13m
kubeflow admission-webhook-deployment-cb6db9648-78rtl 0/1 ImagePullBackOff 0 13m
kubeflow kserve-models-web-app-f9c576856-88qdc 1/2 ImagePullBackOff 1 (5m38s ago) 13m
kubeflow centraldashboard-dd9c778b6-78snk 1/2 ImagePullBackOff 1 (5m38s ago) 13m
kubeflow ml-pipeline-ui-5ddb5b76d8-89hdf 2/2 Running 7 (2m15s ago) 13m
kubeflow jupyter-web-app-deployment-cc9cbc696-bvb48 1/2 ImagePullBackOff 1 (5m38s ago) 13m
kubeflow volumes-web-app-deployment-7b998df674-765sm 1/2 ImagePullBackOff 0 13m
kubeflow tensorboards-web-app-deployment-8474fd9569-4xnst 1/2 ImagePullBackOff 0 13m
kubeflow notebook-controller-deployment-699589b4f9-bb6fd 1/2 ImagePullBackOff 1 (5m38s ago) 13m
kubeflow profiles-deployment-74f656c59f-qbzlz 1/3 ImagePullBackOff 1 (5m38s ago) 13m
kubeflow tensorboard-controller-deployment-5655cc9dbb-5mvfg 2/3 ImagePullBackOff 0 13m
When I inspect the problematic pods: (for example tensorboard-controller-deployment-5655cc9dbb-5mvfg ) I obtained:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 15m default-scheduler 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Warning FailedScheduling 8m10s default-scheduler 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Normal Scheduled 7m57s default-scheduler Successfully assigned kubeflow/tensorboard-controller-deployment-5655cc9dbb-5mvfg to lima-rancher-desktop
Warning FailedScheduling 16m default-scheduler 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Normal Created 7m43s kubelet Created container istio-init
Normal Pulled 7m43s kubelet Container image "docker.io/istio/proxyv2:1.16.0" already present on machine
Normal Started 7m42s kubelet Started container istio-init
Normal Pulling 7m28s kubelet Pulling image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0"
Normal Pulled 7m8s kubelet Successfully pulled image "gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0" in 19.515117093s
Normal Created 7m7s kubelet Created container kube-rbac-proxy
Normal Created 7m6s kubelet Created container istio-proxy
Normal Pulled 7m6s kubelet Container image "docker.io/istio/proxyv2:1.16.0" already present on machine
Normal Started 7m6s kubelet Started container kube-rbac-proxy
Normal Started 7m5s kubelet Started container istio-proxy
Warning Unhealthy 7m2s (x2 over 7m3s) kubelet Readiness probe failed: Get "http://10.42.0.104:15021/healthz/ready": dial tcp 10.42.0.104:15021: connect: connection refused
Normal Pulling 6m20s (x3 over 7m39s) kubelet Pulling image "docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0"
Warning Failed 6m8s (x2 over 7m28s) kubelet Failed to pull image **"docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0": no match for platform in manifest: not found**
Warning Failed 6m8s (x2 over 7m28s) kubelet Error: ErrImagePull
Warning Failed 5m55s (x3 over 6m47s) kubelet Error: ImagePullBackOff
Normal BackOff 81s (x20 over 6m47s) kubelet Back-off pulling image "docker.io/kubeflownotebookswg/tensorboard-controller:v1.7.0-rc
It looks like the docker image registry is not found. Any idea how should I proceed ?
Upvotes: 1
Views: 307
Reputation: 37
Yes,
after further research I found that the image I need, is not available for macos m1. I need a full virtualization in order to make it work.
Upvotes: 1
Reputation: 1187
The error is speaking; it would seem that the URL associated with the IMAGE field of the deployment in question doesn't work.
Try changing the image from docker.io/... to kubeflownotebookswg/tensorboard-controller:v1.7.0-rc.0
Behind the scenes Kubernetes runs docker pull ...
pointing to docker-hub.
Upvotes: 0