Nick Fernandez
Nick Fernandez

Reputation: 1320

AWS CannotPullContainerError no space left on device Docker

I'm trying to use a large docker image (the image is on dockerhub here about 18GB) as a job definition for AWS batch. I'm getting the following error about running out of space:

CannotPullContainerError: write /var/lib/docker/tmp/GetImageBlob#######: no space left on device

The Cloudformation JSON section that defines the job is here

  "JobDef3": {
      "Type": "AWS::Batch::JobDefinition",
      "Properties": {
        "Type": "container",
        "ContainerProperties": {
          "Image": {
            "Fn::Join": [
              "",
              [
                "cornhundred/",
                "dockerized-cellranger-nick:latest"
              ]
            ]
          },
          "Vcpus": 1,
          "Command": ["some command"],
          "Memory": 3000,
        },
        "RetryStrategy": {
          "Attempts": 1
        }
      }
    },

How can I get AWS to increase the amount of space available so that I can run this image?

Upvotes: 1

Views: 3915

Answers (2)

CAMD_3441
CAMD_3441

Reputation: 3154

I had a similar issue. Clearing up unused docker images and volumes didn't work for me (ie docker container prune nor docker system prune

I saw another page saying that restarting docker fixed it for that user, but doing a service docker restart I got this error: /etc/init.docker: line 35: ulimit: open files: cannot modify limit: Operation not permitted

To try and fix that issue, I saw sites mentioning to update the ulimit values in some configuration files but when I tried to save the file with the updated parameters I got write error (file system full?)

At which point, I realized (as the initial error you showed) I needed to clean up and remove files.

I did a du -h from the root folder and saw that the /var/lib/docker/tmp/ folder (which is part of the error message I experienced and you posted above) used up way more disk space than other folders.

So I removed older files there and I no longer got that error message.

Upvotes: 0

Nick Fernandez
Nick Fernandez

Reputation: 1320

I was able to run the docker container by moving the large files (~15GB reference genome files) out of the docker image and downloading them after running the container. I also needed to make a custom Amazon Machine Image (AMI, see AWS Batch Genomics for an example) and attach a volume to handle the large reference genome files since the default container was not large enough.

Upvotes: 3

Related Questions