Reputation: 2158
We're trying to run a COS VM with GPU and a container in artifact registry.
Running into several issues including
Upvotes: 0
Views: 471
Reputation: 2158
You can only do this with the #cloud config solution, but the solution here is also incomplete.
Please see below. The future is yours. Includes gotchas like:
#cloud-config
users:
- name: cloudservice
uid: 2000
groups: docker # Add the user to the Docker group
write_files:
- path: /etc/systemd/system/install-gpu.service
permissions: 0644
owner: root
content: |
[Unit]
Description=Install GPU drivers
Wants=gcr-online.target docker.socket
After=gcr-online.target docker.socket
[Service]
User=root
Type=oneshot
ExecStart=cos-extensions install gpu
StandardOutput=journal+console
StandardError=journal+console
- path: /etc/systemd/system/cloudservice.service
permissions: 0644
owner: root
content: |
[Unit]
Description=Run a myapp GPU application container
Requires=install-gpu.service
After=install-gpu.service
[Service]
ExecStartPre=sudo -u cloudservice /usr/bin/docker-credential-gcr configure-docker --registries us-central1-docker.pkg.dev
ExecStart=sudo -u cloudservice -E /usr/bin/docker run --rm -u 2000 --name=mycloudservice --device /dev/nvidia0:/dev/nvidia0 us-central1-docker.pkg.dev/xxxxxxxxxxx:latest
ExecStop=sudo -u cloudservice /usr/bin/docker stop mycloudservice
ExecStopPost=sudo -u /usr/bin/docker rm mycloudservice
runcmd:
- systemctl daemon-reload
- systemctl start install-gpu.service
- systemctl start cloudservice.service
gcloud compute instances create winner \
--image-family cos-stable \
--image-project cos-cloud \
--metadata-from-file user-data=${path-to-cloudconfig.yaml} \
--zone ${COMPUTE_ZONE} \
--machine-type ${machinetype} \
--boot-disk-size ${bootdisksize} --scopes cloud-platform \
--tags ${network_tags} \
--maintenance-policy TERMINATE --restart-on-failure \
--machine-type ${machinetype} \
--metadata google-logging-enabled=${googleloggingenabled},google-monitoring-enabled=${googlemonitoringenabled},cos-metrics-enabled=true \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--accelerator type=${gpuType},count=${gpuCount} \
Upvotes: 2