Reputation: 77
I am able to perform KVM live migration from one node to another when it is in running mode. But when I am trying to live migrate it when stressing from another node using Cassandra stress tool to the current host node Cassandra.It is resulting in failure with the following display.
How can I fix it?
Unable to migrate guest: operation failed: domain is no longer running
Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 90, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/migrate.py", line 438, in _async_migrate meter=meter)
File "/usr/share/virt-manager/virtManager/domain.py", line 1488, in migrate
self._backend.migrate(libvirt_destconn, flags, None, interface, 0)
File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1535, in migrate
if ret is None:raise libvirtError('virDomainMigrate() failed', dom=self)
libvirtError: operation failed: domain is no longer running
Upvotes: 1
Views: 926
Reputation: 2826
Live migration has to copy all guest memory pages from the source host to the destination host. The time taken todo this depends on the network bandwidth available, and to a lesser extent CPU performance (if libvirt is encrypting migration data). While this copying is taking place, the guest may still be modifying its memory, so the same memory pages may need to be copied over & over again. If the guest dirties memory pages faster than QEMU can transfer them over the network, migration will never complete.
Running intensive workloads like Cassandra make it quite likely that the migration will fail to complete. There's a variety of options that can be turned on to make it work, but the most useful are "Auto converge" which throttles guest CPUs to slow down rate of memory dirtying, and "Post copy" which switches the guest to the target host immediately and does copy-on-write to pull pages off the source.
Post-copy is the most effective way of getting migration to complete in finite amount of time, but there's a small risk of loosing the VM entirely if the source host or network breaks before post-copy completes. Auto-converge is a good second choice if post-copy is not available in your version of KVM, or if you are unwilling to take that small risk of post-copy loosing the VM.
I don't believe either are available in virt-manager, so you would have to trigger migration using virsh to use these features.
Upvotes: 1