Reputation: 690
I have a cloud-init file that sets up all requirements for our AWS instances, and part of those requirements is formating and mounting an EBS volume. The issue is that on some instances volume attachment occurs after the instance is up, so when cloud-init executes the volume /dev/xvdf
does not yet exist and it fails.
I have something like:
#cloud-config
resize_rootfs: false
disk_setup:
/dev/xvdf:
table_type: 'gpt'
layout: true
overwrite: false
fs_setup:
- label: DATA
filesystem: 'ext4'
device: '/dev/xvdf'
partition: 'auto'
mounts:
- [xvdf, /data, auto, "defaults,discard", "0", "0"]
And would like to have something like a sleep 60
or something like that before the disk configuration block.
If the whole cloud-init execution can be delayed, that would also work for me.
Also, I'm using terraform to create the infrastructure.
Thanks!
Upvotes: 8
Views: 6444
Reputation: 24806
Building up from this other answer.
There are two solutions depending on the cloud-init version
device_aliases
, disk_setup
, fs_setup
and mounts
with x-systemd.device-timeout
disk_setup
and mounts
with x-systemd.device-timeout
and x-systemd.makefs
(this options make a substitute for fs_setup
not working with nvme partition on cloud-init < 24.2)If you can use cloud-init 24.2 (released July 2024) you can partition, format and mount EBS volumes (that are exposed as NVMe in AWS Nitro instances, see Amazon EBS and NVMe) like this (tested on Fedora 41 Rawhide 20240711):
#cloud-config
device_aliases:
disk1: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30
disk_setup:
disk1:
table_type: gpt
layout: [50,25,25]
overwrite: false
fs_setup:
- label: disk1-earth
filesystem: xfs
device: disk1
partition: 1
- label: disk1-mars
filesystem: xfs
device: disk1
partition: 2
- label: disk1-venus
filesystem: xfs
device: disk1
partition: 3
mounts:
- [ LABEL=disk1-earth, /earth, xfs, "defaults,nofail,x-systemd.device-timeout=30"]
- [ LABEL=disk1-mars, /mars, xfs, "defaults,nofail,x-systemd.device-timeout=30"]
- [ LABEL=disk1-venus, /venus, xfs, "defaults,nofail,x-systemd.device-timeout=30"]
mounts_default_fields: [ None, None, "auto", "defaults,nofail", "0", "2"]
cloud-init 24.2 is required if you want to partition the disk since previous versions do not work with NVMe (see #5246 that was fixed by #5263 and released on cloud-init 24.2)
If you don't need several partitions you can use any reasonably recent cloud-init.
The x-systemd.device-timeout=30
in the mount options tells mount
to wait 30 seconds for the device to become available providing the delay requested by the OP.
You can verify the proper partitioning, formatting and mount afterwards with the following commands
sudo blkid -s LABEL
lsblk -o name,size,mountpoint,label
findmnt --fstab
If your distro does not have cloud-init 24.2, you can't use fs_setup
for NVMe with partitions (see bug #5246 that was fixed by #5263 and released on cloud-init 24.2.
Since you can't use fs_setup
, you need to use x-systemd.makefs
on the mount options. fs_setup
did also serve to assign a disk partition label and that you can't do via mount options, so you "lose" the ability of giving it a label.
#cloud-config
device_aliases:
disk1: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30
disk_setup:
disk1:
table_type: gpt
layout: [50,25,25]
overwrite: false
fs_setup:
- label: disk1-earth
filesystem: xfs
device: disk1
partition: 1
- label: disk1-mars
filesystem: xfs
device: disk1
partition: 2
- label: disk1-venus
filesystem: xfs
device: disk1
partition: 3
mounts:
- [ /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30-part1, /earth, xfs, "defaults,nofail,x-systemd.device-timeout=30s,x-systemd.makefs"]
- [ /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30-part2, /mars, xfs, "defaults,nofail,x-systemd.device-timeout=30s,x-systemd.makefs"]
- [ /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0a250869ccd411b30-part3, /venus, xfs, "defaults,nofail,x-systemd.device-timeout=30s,x-systemd.makefs"]
mounts_default_fields: [ None, None, "auto", "defaults,nofail", "0", "2"]
You can verify the proper partitioning, formatting and mount afterwards with the following commands
lsblk -o name,size,mountpoint,label
findmnt --fstab
Upvotes: 0
Reputation: 28533
I know this already has an accepted answer. But I just went through this exercise and solved it a slightly different way by waiting for the disk instead of rebooting and running again. Here was my solution:
#cloud-config
bootcmd:
- |
timeout 30s sh -c 'while [ ! -e /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_${volid} ]; do sleep 1; done'
device_aliases:
my_data: /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_${volid}
disk_setup:
my_data:
table_type: gpt
layout: true
overwrite: false
fs_setup:
- label: my_data
filesystem: xfs
partition: any
device: my_data
overwrite: false
mounts:
- [my_data, /opt/splunk, xfs]
My provisioner (in this case Terraform) replaces ${volid}
with the volume ID that I expect attached to the instance (which comes from a function like replace(aws_ebs_volume.splunk_data[count.index].id, "-", "")
). This may be helpful to someone as an alternative way of achieving the goal.
Upvotes: 1
Reputation: 588
I guess cloud-init does have an option for running adhoc commands. have a look into this link.
https://cloudinit.readthedocs.io/en/latest/topics/modules.html?highlight=runcmd#runcmd
Not sure what your code looks like, but I just tried to pass the below as user_data in AWS and could see that the init script sleep for 1000 seconds... ( Just added a couple of echo statements to check later). I guess you can add a little more logic as well to verify the presence of the volume.
#cloud-config
runcmd:
- [ sh, -c, "echo before sleep:`date` >> /tmp/user_data.log" ]
- [ sh, -c, "sleep 1000" ]
- [ sh, -c, "echo after sleep:`date` >> /tmp/user_data.log" ]
<Rest of the script>
Upvotes: 4
Reputation: 690
I was able to resolve the issue with two changes:
nofail
option.runcmd
block, deleting the semaphore file for disk_setup.So my new cloud-init file now looks like this:
#cloud-config
resize_rootfs: false
disk_setup:
/dev/xvdf:
table_type: 'gpt'
layout: true
overwrite: false
fs_setup:
- label: DATA
filesystem: 'ext4'
device: '/dev/xvdf'
partition: 'auto'
mounts:
- [xvdf, /data, auto, "defaults,discard", "0", "0"]
runcmd:
- [rm, -f, /var/lib/cloud/instances/*/sem/config_disk_setup]
power_state:
mode: reboot
timeout: 30
It will reboot, then it will execute the disk_setup
module once more. By this time, the volume will be attached so the operation won't fail.
I guess this is kind of a hacky way to solve this, so if someone has a better answer (like how to delay the whole cloud-init execution) please share it.
Upvotes: 5