One of the ways that we are able to accomplish regularly scheduled maintenance is by utilizing Live Migration, with this we can migrate workloads from one physical machine to another without having service interruption. The way that it is done with Logical Domains is much more flexible than with most other hypervisor solutions, it doesn’t require any complicated cluster setup, no management layer, so you could literally utilize any compatible hardware at the drop of the hat.
This live migration article also focuses on some technology that I have written on, but not yet published (should be published within the next week), this technology is Alternate Service Domains, if you are using this then Live Migration is still possible, and if you are not using it, then Live Migration is actually easier (as the underlying devices are simpler, so it is simpler to match them).
Caveats to Migration
Live Migration Dry Run
I recommend performing a dry run of any migration prior to performing the actual migration. This will highlight any configuration problems prior to the migration happening.
# ldm migrate-domain -n ldom1 root@server<br /> Target Password:
This will generate any errors that would generate in an actual migration, however it will do so without actually causing you problems.
When you are ready to perform the migration then remove the dry run flag. This process will also do the appropriate safety checks to ensure that everything is good on the receiving end.
# ldm migrate-domain ldom1 root@server<br /> Target Password:
Now the migration will proceed and unless something happens it will come up on the other system.
Live Migration With Rename
We can also rename the logical domain as part of the migration, we simply specify the new name.
# ldm migrate-domain ldom1 root@server:ldom2<br /> Target Password:
In this case the original name was ldom1 and the new name is ldom2.
Here are some common errors.
Bad Password or No LDM on Target
# ldm migrate-domain ldom1 root@server<br /> Target Password:<br /> Failed to establish connection with ldmd(1m) on target: server<br /> Check that the 'ldmd' service is enabled on the target machine and<br /> that the version supports Domain Migration. Check that the 'xmpp_enabled'<br /> and 'incoming_migration_enabled' properties of the 'ldmd' service on<br /> the target machine are set to 'true' using svccfg(1M).
Probable Fixes – Ensure you are attempting to migrate to the correct hypervisor, you have the username/password combination correct, and that the user has the appropriate level of access to ldmd and that ldmd is running.
Missing Virtual Disk Server Devices
# ldm migrate-domain ldom1 root@server<br /> Target Password:<br /> The number of volumes in mpgroup 'zfs-ib-nfs' on the target (1) differs<br /> from the number on the source (2)<br /> Domain Migration of LDom ldom1 failed
Probable Fixes – Ensure that the underlying virtual disk devices match, if you are using mpgroups, then the entire mpgroup must match on both sides.
Missing Virtual Switch Device
# ldm migrate-domain ldom1 root@server<br /> Target Password:<br /> Failed to find required vsw alternate-vsw0 on target machine<br /> Domain Migration of LDom logdom1 failed
Probable Fixes – Ensure that the underlying virtual switch devices match on both locations.
Check Migration Progress
One thing to keep in mind, is that during the migration process, the hypervisor that is being evacuated is the authoritative one in terms of controlling the process, so status should be checked there.
source# ldm list -o status ldom1<br /> NAME<br /> logdom1 </p> <p>STATUS<br /> OPERATION PROGRESS TARGET<br /> migration 20% 172.16.24.101:logdom1
It can however be checked on the receiving end, though it will look a little bit different.
target# ldm list -o status logdom1<br /> NAME<br /> logdom1</p> <p>STATUS<br /> OPERATION PROGRESS SOURCE<br /> migration 30% ak00176306-primary
The big thing to notice is that it shows the source on this side, also if we changed the name as part of the migration it will also show the name using the new name.
Of course if you need to cancel a migration, this would be done on the hypervisor that is being evacuated, since it is authoritative.
# ldm cancel-operation migration ldom1<br /> Domain Migration of ldom1 has been cancelled
This will allow you to cancel any accidentally started migrations, however likely anything that you needed to cancel would generate an error before needing to do this.
Cross CPU Considerations
By default logical domains are created to use very specific CPU features based on the hardware it runs on, as such live migration only works by default on the exact same CPU type and generation. However if we change the CPU
Native – Allows migration between same CPU type and generation.
Generic – Allows the most generic processor feature set to allow for widest live migration capabilities.
Migration Class 1 – Allows migration between T4, T5 and M5 server classes (also supports M10 depending on firmware version)
SPARC64 Class 1 – Allows migration between Fujitsu M10 servers.
Here is an example of how you would change the CPU architecture of a domain. I personally recommend using this sparingly and building your hardware infrastructure in a way where you have the capacity on the same generation of hardware, however in certain circumstances this can make a lot of sense if the performance implications are not too great.
# ldm set-domain cpu-arch=migration-class1 ldom1
I personally wouldn’t count on the Cross-CPU functionality, however in some cases it might make sense for your situation, either way Live Migration of Logical Domains is done in a very effective manner and adds a lot of value.