Reputation: 344
I am working on a project where we use ZFS as a storage volume manager. On top of ZFS, an ISCSI tgt daemon is running and exposing the ZFS devices as SCSI disks. The problem now is ZFS high availability. In fact, ZFS cannot be clustered. The solutions below have some issues that's why I avoided them.
https://github.com/ewwhite/zfs-ha/wiki: needs the servers to be up to export the zpools metadata during the failover
Using snapshots: Snapshots are good for backups but not high availability. In fact, I lost data during the failover because the two pools are not synchronized. (The second pool has only the last snapshot before the first server is dead and all the data written after the snapshot is made and before the failover is lost)
Is there any way to make these SCSI disks high available by making ZFS pool high available? could add a clustered filesystem on top of ZFS make any sense?
Upvotes: 3
Views: 3926
Reputation: 7737
Andrew Henle’s comment is the most obvious way to do this: force-import the pool with zpool import -f
on the secondary server and prevent the primary from re-importing the storage. The second part is the hard part though!
If you can physically detach the storage immediately after the server dies, perfect. If not, which will be the case for most systems, you will need some way to manage this transfer of pool ownership between servers, probably with some kind of keepalive / ownership leases protocol. You can either do this in the storage itself or at some higher level.
txg
instead of using timestamp-based leases, since timestamps mean your servers need very similar times on them or your mutual exclusion may not work (although the ZIL creates issues for this because it can be updated outside of a txg
IIRC). Ideally this would be a feature of ZFS itself but I don’t think anyone has implemented it yet (although I know it’s been discussed).Ultimately the best solution is to use high-level symptoms to decide whether to failover, but low-level mutual exclusion enforcement. However without support inside of ZFS for mutual exclusion, you may need to do both above the ZFS layer, for example by making a shim layer that checks for ownership before issuing a write to ZFS.
If you think network partitions and performance problems are not really going to be an issue compared to machine crashes / reboots (probably a reasonable assumption in small-ish datacenters since these are lower-probability events), then you probably don’t need the storage-level mutual exclusion at all, and the higher-layer solution would work fine.
Upvotes: 2
Reputation: 1089
See https://mezzantrop.wordpress.com/portfolio/the-beast/ if it's applicable for you.
Upvotes: 0