Reputation: 466
I am planning to create cluster with three nodes and each node will be launched in three different Amazon EC2 zone.
As per Datastax Documentation, I will use Ec2MultiRegionSnitch and replication stragey is NetworkTopologyStrategy. Below is my needs to be achieved
Cluster Size : 3 (Spanning Across Amazon EC2 Region).
Replication Factor: 3
Read and Write Level : QUORUM.
Based on the above configuration, I can survive on single node loss(Meaning that down of any one of amazon region. Correct me if I am wrong).
In order to achieve the above configuration, I have two option
Option-1 : Using Datastax provided Amazon EC2 AMI image.
This option launch the instance with almost all components needed to run cassandra with some monitoring tools(opscenter..etc)
But It store all data on EC2 Instance Store hence data persists only during the life of the instance and the storage size depends upon instance type.
Option-2 : Using Customised installation
In this option, I have to launch Amazon EC2 Ubuntu AMI,installing JAVA,installing Datastax community edition.
This option enable me to store all my data on EBS. Hence I can expand EBS whenever I needed and the same time I can restore any node using EBS snapshot.
My Question:
Which one of the option is suitable for my needs?.
Note:
I read the documentation provided by Datastax and very new to cassandra. Hence, Whatever inputs you provided will be very useful to me.
Thanks
Upvotes: 0
Views: 278
Reputation: 2166
It's not true that you get Datastax AMI only with EC2 ephemeral storage. Starting from version 2.5 they claim you can choose EBS as well: Introducing the DataStax Auto-Clustering AMI 2.5. That's an relatively easy way of getting started which I've personally chosen.
Should you choose EBS or EC2 ephemeral storage?
The answer is: it depends...
The past (~2012-2013):
EC2 instances with ephemeral storage were a better choice. There were detailed performance benchmarks over the years which indicated that EBS is getting better, but still, attached physical drives were better.
The past (~2014):
EC2 choice is still better. Datastax wrote a nice post about pricing, network and failure resilience: What is the story with AWS storage?
Present (~2016):
instaclustr claims:
By running Cassandra on Amazon EBS, you can run denser, cheaper Cassandra clusters with just as much availability as ephemeral storage instances.
Nice presentation here: AWS re:Invent 2015 | (BDT323) Amazon EBS & Cassandra: 1 Million Writes Per Second on 60 Nodes
All in all, I suggest you doing a TCO analysis and if there isn't a big difference in price, choose EBS - because of out of the box ability to make a snapshot. What's more, chances are EBS will be improved over the time.
Upvotes: 0