njho
njho

Reputation: 2158

How do you run a COS VM on GCP with GPU and Private Repo

We're trying to run a COS VM with GPU and a container in artifact registry.

Running into several issues including

Upvotes: 0

Views: 471

Answers (1)

njho
njho

Reputation: 2158

You can only do this with the #cloud config solution, but the solution here is also incomplete.

Please see below. The future is yours. Includes gotchas like:

  • Adding user to Docker user group
  • Authenticating with the docker registry
  • Calling with the necessary user permissions
  • How to call it with cloud-config
#cloud-config

users:
  - name: cloudservice
    uid: 2000
    groups: docker # Add the user to the Docker group

write_files:
  - path: /etc/systemd/system/install-gpu.service
    permissions: 0644
    owner: root
    content: |
      [Unit]
      Description=Install GPU drivers
      Wants=gcr-online.target docker.socket
      After=gcr-online.target docker.socket

      [Service]
      User=root
      Type=oneshot
      ExecStart=cos-extensions install gpu
      StandardOutput=journal+console
      StandardError=journal+console
  - path: /etc/systemd/system/cloudservice.service
    permissions: 0644
    owner: root
    content: |
      [Unit]
      Description=Run a myapp GPU application container
      Requires=install-gpu.service
      After=install-gpu.service

      [Service]
      ExecStartPre=sudo -u cloudservice /usr/bin/docker-credential-gcr configure-docker --registries us-central1-docker.pkg.dev
      ExecStart=sudo -u cloudservice -E /usr/bin/docker run --rm -u 2000 --name=mycloudservice --device /dev/nvidia0:/dev/nvidia0 us-central1-docker.pkg.dev/xxxxxxxxxxx:latest
      ExecStop=sudo -u cloudservice /usr/bin/docker stop mycloudservice
      ExecStopPost=sudo -u /usr/bin/docker rm mycloudservice

runcmd:
  - systemctl daemon-reload
  - systemctl start install-gpu.service
  - systemctl start cloudservice.service
    gcloud compute instances create winner \
      --image-family cos-stable \
      --image-project cos-cloud \
      --metadata-from-file user-data=${path-to-cloudconfig.yaml} \
      --zone ${COMPUTE_ZONE} \
      --machine-type ${machinetype} \
      --boot-disk-size ${bootdisksize} --scopes cloud-platform \
      --tags ${network_tags} \
      --maintenance-policy TERMINATE --restart-on-failure \
      --machine-type ${machinetype} \
      --metadata google-logging-enabled=${googleloggingenabled},google-monitoring-enabled=${googlemonitoringenabled},cos-metrics-enabled=true \
      --scopes=https://www.googleapis.com/auth/cloud-platform \
      --accelerator type=${gpuType},count=${gpuCount} \

Upvotes: 2

Related Questions