user4927715
user4927715

Reputation:

Estimating hardware for hadoop

I have got 1TB of hive data.I want process the data within 2 hours...And the hadoop cluster will not grow because it doesn't have user interaction. How much RAM and cpu is required for each machine if I want to have 3 running machines

Upvotes: 1

Views: 151

Answers (1)

mattinbits
mattinbits

Reputation: 10428

This is dependent on the complexity of your process. A simple word count will surely complete before a complex data science algorithm. Your choice of implementation (e.g. Map-Reduce vs Spark) will also influence execution time.

For any given hardware specification, some processes may complete while others may miss the deadline. You won't get a complete answer without giving more details about your workload (and even then the answer will probably be a recommendation to run practical experiments with your particular process). However, I can say that when sizing a cluster, there are two resources I tend to reference:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.2/bk_cluster-planning-guide/content/ch_hardware-recommendations.html

http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

The cloudera blog in particular discussed different hardware requirement depending on whether your workload is storage intensive, compute intensive, etc.

Upvotes: 2

Related Questions