spitfiredd
spitfiredd

Reputation: 3125

AWS Batch job - no space left on device, but EBS autoscaling is enabled

I am getting the following error on my batch job.

I am running into an issue where I get a java.lang.RuntimeException: java.io.IOException: No space left on device in my Batch jobs. I thought that the EBS volume that is used as a mount dir has EBS auto-scaling.

My batch job is running bbnorm.sh bbtools on two paired fq.gz each file is approximately 22 GB in size.

This base project that this template came from can be found here: Genomics Secondary Analysis Using AWS Step Functions and AWS Batch.

Here is my template:

Resources:
  LaunchTemplate:
    Type: "AWS::EC2::LaunchTemplate"
    Properties:
      LaunchTemplateData:
        BlockDeviceMappings:
          - Ebs:
              # root volume
              Encrypted: True
              DeleteOnTermination: True
              VolumeSize: 50
              VolumeType: gp2 
            DeviceName: /dev/xvda
          - Ebs:
              # ecs optimized ami docker storage volume, kept for compatibility
              Encrypted: True
              DeleteOnTermination: True
              VolumeSize: 22
              VolumeType: gp2 
            DeviceName: /dev/xvdcz
          - Ebs:
              # docker storage volume (amazon-ebs-autoscale managed)
              Encrypted: True
              DeleteOnTermination: True
              VolumeSize: 100
              VolumeType: gp2 
            DeviceName: /dev/sdc
        TagSpecifications:
          - ResourceType: volume
            Tags:
              - Key: Project
                Value: !Ref Project
              - Key: SolutionId
                Value: !FindInMap ['solution', 'metadata', 'id']
        UserData:
          Fn::Base64: |
            MIME-Version: 1.0
            Content-Type: multipart/mixed; boundary="==BOUNDARY=="

            --==BOUNDARY==
            Content-Type: text/cloud-config; charset="us-ascii"

            packages:
            - jq
            - btrfs-progs
            - wget
            - git
            - bzip2

            runcmd:
            - pip3 install -U awscli boto3

            - systemctl stop ecs
            - systemctl stop docker

            # install amazon-ebs-autoscale
            - cp -au /var/lib/docker /var/lib/docker.bk
            - rm -rf /var/lib/docker/*
            - EBS_AUTOSCALE_VERSION=$(curl --silent "https://api.github.com/repos/awslabs/amazon-ebs-autoscale/releases/latest" | jq -r .tag_name)
            - cd /opt && git clone https://github.com/awslabs/amazon-ebs-autoscale.git
            - cd /opt/amazon-ebs-autoscale && git checkout $EBS_AUTOSCALE_VERSION
            - sh /opt/amazon-ebs-autoscale/install.sh /var/lib/docker /dev/sdc 2>&1 > /var/log/ebs-autoscale-install.log
            - sed -i 's+OPTIONS=.*+OPTIONS="--storage-driver btrfs"+g' /etc/sysconfig/docker-storage
            - cp -au /var/lib/docker.bk/* /var/lib/docker
            
            # install miniconda/awscli
            - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
            - bash Miniconda3-latest-Linux-x86_64.sh -b -f -p /opt/miniconda
            - /opt/miniconda/bin/conda install -c conda-forge -y awscli
            - chown -R ec2-user:ec2-user /opt/miniconda
            - rm Miniconda3-latest-Linux-x86_64.sh

            - trap "systemctl start docker;systemctl enable --now --no-block ecs" INT ERR EXIT

            --==BOUNDARY==--

Upvotes: 1

Views: 2412

Answers (1)

spitfiredd
spitfiredd

Reputation: 3125

After researching I have found that that is was much easier to update the launch spec with an increased volume.

AWS Batch doesn't support updating a compute environment with a new launch template version. If you update your launch template, you must create a new compute environment with the new template for the changes to take effect.

https://docs.aws.amazon.com/batch/latest/userguide/launch-templates.html

Upvotes: 1

Related Questions