Mike
Mike

Reputation: 582

Multiple Containers Sharing Single GPU

I have configured an ECS service running on a g4dn.xlarge instance which has a single GPU. Inside the task definition I specify the container definition resource requirement to use one GPU as such:

"resourceRequirements": [
  {
    "type":"GPU",
    "value": "1"
  }
]

Running one task and one container on this instance works fine. When I set the service's desired task count to 2, I receive an event on the service that states:

service was unable to place a task because no container instance met all of its requirements. The closest matching container-instance has insufficient GPU resource available.

According to the AWS docs:

Amazon ECS will schedule to available GPU-enabled container instances and pin physical GPUs to proper containers for optimal performance.

If there any way to override this default behavior and force ECS to allow multiple container instances to share a single GPU?

I don't believe we will run into issues with performance on sharing as we plan to use the each container for H264 encoding (nvenc) which is not CUDA. If anyone can direct me to documentation concerning performance of CUDA on containers sharing a GPU, that would also be appreciated.

Upvotes: 7

Views: 3423

Answers (1)

flow
flow

Reputation: 71

The tricks is to enable nvidia docker runtime by default for all containers if it is suitable for your use

Base on an Amazon AMI amazon/amzn2-ami-ecs-gpu-hvm-2.0.20200218-x86_64-ebs, connect to the instance and add the configuration below :

sudo cat <<"EOF" > /etc/docker/daemon.json
{
  "default-runtime": "nvidia",
  "runtimes": {
      "nvidia": {
        "path": "/etc/docker-runtimes.d/nvidia"
      }
  }
}
EOF
sudo pkill -SIGHUP dockerd
tail -10 /var/log/messages

Create a new AMI and don't specify any values on GPU container definition.

Upvotes: 7

Related Questions