Reputation: 690

cloud-init: delay disk_setup and fs_setup

I have a cloud-init file that sets up all requirements for our AWS instances, and part of those requirements is formating and mounting an EBS volume. The issue is that on some instances volume attachment occurs after the instance is up, so when cloud-init executes the volume /dev/xvdf does not yet exist and it fails.

I have something like:

#cloud-config

resize_rootfs: false
disk_setup:
    /dev/xvdf:
        table_type: 'gpt'
        layout: true
        overwrite: false

fs_setup:
    - label: DATA
      filesystem: 'ext4'
      device: '/dev/xvdf'
      partition: 'auto'

mounts:
    - [xvdf, /data, auto, "defaults,discard", "0", "0"]

And would like to have something like a sleep 60 or something like that before the disk configuration block.

If the whole cloud-init execution can be delayed, that would also work for me.

Also, I'm using terraform to create the infrastructure.

Thanks!

Upvotes: 8

Answers (4)

RubenLaguna

Reputation: 24806

Building up from this other answer.

There are two solutions depending on the cloud-init version

cloud-init >= 24.2, using device_aliases, disk_setup, fs_setup and mounts with x-systemd.device-timeout
cloud-init < 24.2, using disk_setup and mounts with x-systemd.device-timeout and x-systemd.makefs (this options make a substitute for fs_setup not working with nvme partition on cloud-init < 24.2)

cloud-init >= 24.2

If you can use cloud-init 24.2 (released July 2024) you can partition, format and mount EBS volumes (that are exposed as NVMe in AWS Nitro instances, see Amazon EBS and NVMe) like this (tested on Fedora 41 Rawhide 20240711):

#cloud-config
device_aliases:
  disk1: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30
disk_setup:
  disk1:
    table_type: gpt
    layout: [50,25,25]
    overwrite: false
fs_setup:
  - label: disk1-earth
    filesystem: xfs
    device: disk1
    partition: 1
  - label: disk1-mars
    filesystem: xfs
    device: disk1
    partition: 2
  - label: disk1-venus
    filesystem: xfs
    device: disk1
    partition: 3
mounts:
  - [ LABEL=disk1-earth, /earth, xfs, "defaults,nofail,x-systemd.device-timeout=30"]
  - [ LABEL=disk1-mars,  /mars, xfs, "defaults,nofail,x-systemd.device-timeout=30"]
  - [ LABEL=disk1-venus, /venus, xfs, "defaults,nofail,x-systemd.device-timeout=30"]
mounts_default_fields: [ None, None, "auto", "defaults,nofail", "0", "2"]

cloud-init 24.2 is required if you want to partition the disk since previous versions do not work with NVMe (see #5246 that was fixed by #5263 and released on cloud-init 24.2)

If you don't need several partitions you can use any reasonably recent cloud-init.

The x-systemd.device-timeout=30 in the mount options tells mount to wait 30 seconds for the device to become available providing the delay requested by the OP.

You can verify the proper partitioning, formatting and mount afterwards with the following commands

sudo blkid -s LABEL
lsblk -o name,size,mountpoint,label
findmnt --fstab

cloud-init < 24.2

If your distro does not have cloud-init 24.2, you can't use fs_setup for NVMe with partitions (see bug #5246 that was fixed by #5263 and released on cloud-init 24.2.

Since you can't use fs_setup, you need to use x-systemd.makefs on the mount options. fs_setup did also serve to assign a disk partition label and that you can't do via mount options, so you "lose" the ability of giving it a label.

#cloud-config
device_aliases:
  disk1: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30
disk_setup:
  disk1:
    table_type: gpt
    layout: [50,25,25]
    overwrite: false
fs_setup:
  - label: disk1-earth
    filesystem: xfs
    device: disk1
    partition: 1
  - label: disk1-mars
    filesystem: xfs
    device: disk1
    partition: 2
  - label: disk1-venus
    filesystem: xfs
    device: disk1
    partition: 3
mounts:
- [ /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30-part1, /earth, xfs, "defaults,nofail,x-systemd.device-timeout=30s,x-systemd.makefs"]
  - [ /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30-part2, /mars, xfs, "defaults,nofail,x-systemd.device-timeout=30s,x-systemd.makefs"]
  - [ /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30-part3, /venus, xfs, "defaults,nofail,x-systemd.device-timeout=30s,x-systemd.makefs"]
mounts_default_fields: [ None, None, "auto", "defaults,nofail", "0", "2"]

You can verify the proper partitioning, formatting and mount afterwards with the following commands

lsblk -o name,size,mountpoint,label
findmnt --fstab

Upvotes: 0

Andy Shinn

Reputation: 28543

I know this already has an accepted answer. But I just went through this exercise and solved it a slightly different way by waiting for the disk instead of rebooting and running again. Here was my solution:

#cloud-config
bootcmd:
  - |
    timeout 30s sh -c 'while [ ! -e /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_${volid} ]; do sleep 1; done'
device_aliases:
  my_data: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_${volid}
disk_setup:
  my_data:
    table_type: gpt
    layout: true
    overwrite: false
fs_setup:
  - label: my_data
    filesystem: xfs
    partition: any
    device: my_data
    overwrite: false
mounts:
  - [my_data, /opt/splunk, xfs]

My provisioner (in this case Terraform) replaces ${volid} with the volume ID that I expect attached to the instance (which comes from a function like replace(aws_ebs_volume.splunk_data[count.index].id, "-", "")). This may be helpful to someone as an alternative way of achieving the goal.

Upvotes: 1

Giridhar

Reputation: 588

I guess cloud-init does have an option for running adhoc commands. have a look into this link.

https://cloudinit.readthedocs.io/en/latest/topics/modules.html?highlight=runcmd#runcmd

Not sure what your code looks like, but I just tried to pass the below as user_data in AWS and could see that the init script sleep for 1000 seconds... ( Just added a couple of echo statements to check later). I guess you can add a little more logic as well to verify the presence of the volume.

#cloud-config

runcmd:
 - [ sh, -c, "echo before sleep:`date` >> /tmp/user_data.log" ]
 - [ sh, -c, "sleep 1000" ]
 - [ sh, -c, "echo after sleep:`date` >> /tmp/user_data.log" ]
 
<Rest of the script>

Upvotes: 4

Christian Rodriguez

Reputation: 690

I was able to resolve the issue with two changes:

Changed the mount options, adding nofail option.
Added a line to the runcmd block, deleting the semaphore file for disk_setup.

So my new cloud-init file now looks like this:

#cloud-config

resize_rootfs: false
disk_setup:
    /dev/xvdf:
        table_type: 'gpt'
        layout: true
        overwrite: false

fs_setup:
    - label: DATA
      filesystem: 'ext4'
      device: '/dev/xvdf'
      partition: 'auto'

mounts:
    - [xvdf, /data, auto, "defaults,discard", "0", "0"]
    
runcmd:
    - [rm, -f, /var/lib/cloud/instances/*/sem/config_disk_setup]

power_state:
    mode: reboot
    timeout: 30

It will reboot, then it will execute the disk_setup module once more. By this time, the volume will be attached so the operation won't fail.

I guess this is kind of a hacky way to solve this, so if someone has a better answer (like how to delay the whole cloud-init execution) please share it.

Upvotes: 5

cloud-init: delay disk_setup and fs_setup

Answers (4)

cloud-init >= 24.2

cloud-init < 24.2

Related Questions