Bhavesh Shah
Bhavesh Shah

Reputation: 3379

Related to speed of execution of Job in Amazon Elastic Mapreduce

My Task is 1) Initially I want to import the data from MS SQL Server into HDFS using SQOOP. 2) Through Hive I am processing the data and generating the result in one table 3) That result containing table from Hive is again exported to MS SQL SERVER back.

I want to perform all this using Amazon Elastic Map Reduce.

The data which I am importing from MS SQL Server is very large (near about 5,00,000 entries in one table. Like wise I have 30 tables). For this I have written a task in Hive which contains only queries (And each query has used a lot of joins in it). So due to this the performance is very poor on my single local machine ( It takes near about 3 hrs to execute completely).

I want to reduce that time as much less as possible. For that we have decided to use Amazon Elastic Mapreduce. Currently I am using 3 m1.large instance and still I have same performance as on my local machine.

In order to improve the performance what number of instances should I need to use? As number of instances we use are they configured automatically or do I need to specify while submitting JAR to it for execution? Because as I use two machine time is same.

And also Is there any other way to improve the performance or just to increase the number of instance. Or am I doing something wrong while executing JAR?

Please guide me through this as I don't much about the Amazon Servers.

Thanks.

Upvotes: 0

Views: 138

Answers (1)

rockbobsta
rockbobsta

Reputation: 58

You could try Ganglia, which can be installed on your EMR cluster using a bootstrap action. This will give you some metrics on the performance of each node in the cluster and may help you optimise to get the right sized cluster: http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_Ganglia.html

If you use the EMR Ruby client on your local machine, you can set up an SSH tunnel to allow you to view the ganglia web interface in Firefox (you'll also need to setup FoxyProxy as per the following http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-connect-master-node-foxy-proxy.html)

Upvotes: 2

Related Questions