Ron Bresler
Ron Bresler

Reputation: 45

Cassandra Datastax AMI on EC2 - Recover from "Stop"/"Start"

We're looking for the best way to deploy a small production Cassandra cluster (community) on EC2. For performance reasons, all recommendations are to avoid EBS.

But when deploying the Datastax provided AMI with Ephemeral storage, whenever the ephemeral storage is wiped out the instance dies permanently. (Start + Stop manually, or sometimes triggered by AWS for maintenance) will render the instance unusable. OpsCenter fails to fix the instance after a reboot and the instance does not recover on its own.

I'd expect the instance to launch itself back up, run some script to detect that the ephemeral storage is wiped, and sync with the cluster. Since it does not the AMI looks appropriate only for dev tasks.

Can anyone please help us understand what is the alternative? We can live with a momentary loss of a node due to replication but if the node never recovers and a new cluster is required this looks like a dead end for a production environment.

  1. is there a way to install Cassandra on EC2 so that it will recover from an Ephemeral storage loss?

  2. If we buy a license for an enterprise edition will this problem go away?

  3. Does this meant that in spite of poor performance, EBS (optimized) with PIOPS is the best way to run Cassandra on AWS?

  4. Is the recommendation to just avoid stopping + starting the instance and hope that AWS will not retire or reallocate their host machine? What is the recommendation in this case?

  5. What about AWS rolling update? Upgrading one machine (killing it) and starting it again, then proceeding to next machine will erase all cluster data, since machines will be responsive (unlike Cassandra on those). That way it can destroy small (e.g. 3 node) cluster.

  6. Has anyone had good experience with payed services such as Instacluster?

Upvotes: 2

Views: 423

Answers (1)

Louis T.
Louis T.

Reputation: 62

New docs from Datastax actually indicate that EBS Optimized GP2 SSD backed instances can be used for production workloads. With EBS backed, you can easily do snapshots which virtually eliminate the chance of data loss on a node, and it makes it so that they are easily migrated to a new host by a simple start/stop.

With ephemeral, you basically have to plan around failure, consider if your entire cluster is in a single region (SimpleSnitch) and that region goes down.

http://docs.datastax.com/en/cassandra/3.x/cassandra/planning/planPlanningEC2.html

Upvotes: 0

Related Questions