Reputation: 1
I would like to setup a hadoop cluster in aws which will have total capacity of 100T approx. If I go and choose aws instances as per http://aws.amazon.com/ec2/instance-types/ , I do not get ideal configuration for data nodes, I would like to use local disks(SSD/NON-SSD) for worker nodes. for e.g. If I select cc2.8xlarge instance for datanode then for 100T I will have to setup 30 cc2.8xlarge instances which would be very costly. Could you please suggest how should I configure my cluster in aws (EC2) with minimum number of datanodes or is there any standard configuration for hadoop in aws ?
Upvotes: 0
Views: 80
Reputation: 13065
If you want to do Hadoop yourself, then you use EBS drives. You can mount a bunch of drives (around 10-20 as I recall) on each node, and each drive can be up to 1 TB.
If you don't want to do it yourself, then look into EMR like monkeymatrix said.
Upvotes: 0
Reputation:
It sounds very much like you want to consider Elastic MapReduce which is a core AWS service based in Hadoop.
http://aws.amazon.com/elasticmapreduce/
You can specify your configuration and the cluster will launch for you - much easier than trying to configure EC2 instances yourself.
Upvotes: 1