Nimitz14
Nimitz14

Reputation: 2328

How can one keep the data on a local SSD between stopping and restarting an instance

In my case I need only CPU compute for a while, and then at at the end I need GPUs. So I run the instance only with CPUs, then stop and restart with GPUs added (and CPUs reduced). However, it seems this will lead to the data on the local SSD being erased. Is there any way around that? Could one maybe back it up first with a snapshot for example and then restore the data to the local SSD after restarting the instance?

I have not tried out using local SSDs. I want to know what would happen.

Upvotes: 3

Views: 859

Answers (2)

fantabolous
fantabolous

Reputation: 22716

In my experience, rebooting is typically fine, while shutting down will always result in data purge.

The easiest way I've found to backup and restore is to copy to/from a persistent drive or Google Cloud Storage. gsutil rsync works well for this. I don't believe snapshots work with local SSDs.

From google docs: https://cloud.google.com/compute/docs/disks/local-ssd

Data on local SSDs persist only through the following events:

If you reboot the guest operating system. If you configure your instance for live migration and the instance goes through a host maintenance event. If the host system experiences a host error, Compute Engine makes a best effort to reconnect to the VM and preserve the local SSD data, but might not succeed. If the attempt is successful, the VM restarts automatically. However, if the attempt to reconnect fails, the VM restarts without the data. While Compute Engine is recovering your VM and local SSD, which can take up to 60 minutes, the host system and the underlying drive are unresponsive. To configure how your VM instances behave in the event of a host error, see Setting instance availability policies.

Data on Local SSDs does not persist through the following events:

If you shut down the guest operating system and force the instance to stop. If you configure the instance to be preemptible and the instance goes through the preemption process. If you configure the instance to stop on host maintenance events and the instance goes through a host maintenance event. If the host system experiences a host error, and the underlying drive does not recover within 60 minutes, Compute Engine does not attempt to preserve the data on your local SSD. While Compute Engine is recovering your VM and local SSD, which can take up to 60 minutes, the host system and the underlying drive are unresponsive. If you misconfigure the local SSD so that it becomes unreachable. If you disable project billing. The instance will stop and your data will be lost.

Upvotes: 0

rvs
rvs

Reputation: 1291

You data may or may not survive machine restart - depending on how lucky on unlucky you are. Moreover, if your VM crashes (e.g. if underlying hardware fails) you may also lose contents of Local SSD at random time.

I don't think Local SSD implements snapshots or any sort of data redundancy functionality. You can however implement your own - e.g. you can partition your SSD using lvm, take lvm snapshots once in a while and upload them to e.g. GCS or store somewhere else.

Upvotes: 3

Related Questions