anurag1007
anurag1007

Reputation: 137

Ideal Spark configuration

am using Apache spark on HDFS with MapR in our project. We are facing issue running spark Jobs, as its failing after a small increase in the data. We are reading data from csv file, doing some trasnformation, aggreation and then storing in HBase.

Current Data size = 3TB

Available resources: Total nodes : 14 Memory Available : 1TB Total VCores : 450 Total Disk : 150 TB

Spark Conf: executorCores : 2 executorInstance : 50 executorMemory: 40GB minPartitions: 600

please suggest, if the above configuration looks fine, because the error am getting looks like it going outOfMemory.

Upvotes: 0

Views: 93

Answers (1)

Ted Dunning
Ted Dunning

Reputation: 1907

Can you say a bit about how the jobs are failing? Without a bit more information, it will be very hard to say. It would help if you were to say which version of Spark and whether you are running under Yarn or with a standalone Spark cluster (or even on Kubernetes)

Even without any information, however, it seems likely that there is a configuration issue here. What may be happening is that Spark is being told contradictory things about how much memory is available so that when it tries to use memory it thinks it is allowed to use, the system says no.

Upvotes: 1

Related Questions