I have done following configuration for Apache Spark 1.2.1 Standalone Cluster: Hadoop 2.6.0 2 nodes - one master and one slave - in Standalone cluster 3-node Cassandra total cores: 6 (2 master, 4 slaves) total memory: 13 GB I run Spark in Standalone cluster manager as: ./spark-submit --class com.b2b.processor.ProcessSampleJSONFileUpdate \ --conf num-executors=2 \ --executor-memory 2g \ --driver-memory 3g \ --deploy-mode cluster \ --supervise \ --master spark://abc.xyz.net:7077 \ hdfs://abc:9000/b2b/b2bloader-1.0.jar ds6_2000/*.json My job is getting executed successfully, i.e. reads data from files and inserts it to Cassandra. Spark documentation says that in Standalone cluster make use of all available cores but my cluster is using only 1 core per application. Also,after starting application on Spark UI it is showing Applications:0 running and Drivers:1 running. My query is: Why it is not using all available 6 cores? Why spark UI showing Applications:0 Running? The code: public static void main(String[] args) throws Exception { String fileName = args[0]; System.out.println("----->Filename : "+fileName); Long now = new Date().getTime(); SparkConf conf = new SparkConf(true) .setMaster("local") .setAppName("JavaSparkSQL_" +now) .set("spark.executor.memory", "1g") .set("spark.cassandra.connection.host", "192.168.1.65") .set("spark.cassandra.connection.native.port", "9042") .set("spark.cassandra.connection.rpc.port", "9160"); JavaSparkContext ctx = new JavaSparkContext(conf); JavaRDD<String> input = ctx.textFile("hdfs://abc.xyz.net:9000/dataLoad/resources/" + fileName,6); JavaRDD<DataInput> result = input.mapPartitions(new ParseJson()).filter(new FilterLogic()); System.out.print("Count --> "+result.count()); System.out.println(StringUtils.join(result.collect(), ",")); javaFunctions(result).writerBuilder("ks","pt_DataInput",mapToRow(DataInput.class)).saveToCassandra(); }

Reputation: 2678

Why does Spark Standalone cluster not use all available cores?

I have done following configuration for Apache Spark 1.2.1 Standalone Cluster:

Hadoop 2.6.0
2 nodes - one master and one slave - in Standalone cluster
3-node Cassandra
total cores: 6 (2 master, 4 slaves)
total memory: 13 GB

I run Spark in Standalone cluster manager as:

./spark-submit --class com.b2b.processor.ProcessSampleJSONFileUpdate \
               --conf num-executors=2 \
               --executor-memory 2g \
               --driver-memory 3g \
               --deploy-mode cluster \
               --supervise \
               --master spark://abc.xyz.net:7077 \ 
               hdfs://abc:9000/b2b/b2bloader-1.0.jar ds6_2000/*.json

My job is getting executed successfully, i.e. reads data from files and inserts it to Cassandra.

Spark documentation says that in Standalone cluster make use of all available cores but my cluster is using only 1 core per application. Also,after starting application on Spark UI it is showing Applications:0 running and Drivers:1 running.

My query is:

Why it is not using all available 6 cores?
Why spark UI showing Applications:0 Running?

The code:

public static void main(String[] args) throws Exception {

  String fileName = args[0];
  System.out.println("----->Filename : "+fileName);        

  Long now = new Date().getTime();

  SparkConf conf = new SparkConf(true)
           .setMaster("local")
           .setAppName("JavaSparkSQL_" +now)
           .set("spark.executor.memory", "1g")
           .set("spark.cassandra.connection.host", "192.168.1.65")
           .set("spark.cassandra.connection.native.port", "9042")
           .set("spark.cassandra.connection.rpc.port", "9160");

  JavaSparkContext ctx = new JavaSparkContext(conf);

  JavaRDD<String> input =  ctx.textFile("hdfs://abc.xyz.net:9000/dataLoad/resources/" + fileName,6);
  JavaRDD<DataInput> result = input.mapPartitions(new ParseJson()).filter(new FilterLogic());

  System.out.print("Count --> "+result.count());
  System.out.println(StringUtils.join(result.collect(), ","));

  javaFunctions(result).writerBuilder("ks","pt_DataInput",mapToRow(DataInput.class)).saveToCassandra();

}

Upvotes: 4

Answers (3)

User2130

Reputation: 565

What was happening is that you thought you were using standalone mode, for which the default is to use all available nodes, but in reality with "local" as master, you were using local mode. In local mode even when you set local[*], Spark is going to use always only 1 core, since local mode is a non-distributed single-JVM deployment mode. This is also why when you changed your master parameter to "spark://abc.xyz.net:7077" everything went as you were expecting.

Upvotes: 2

eliasah

Reputation: 40380

If you're setting your master in your app to local (via .setMaster("local")), it will not connect to the spark://abc.xyz.net:7077.

You don't need to set the master in app if you are setting it up with the spark-submit command.

Upvotes: 7

None

Reputation: 1468

Try setting master as local[*] this will use all the cores.

Upvotes: 0

Why does Spark Standalone cluster not use all available cores?

Answers (3)

Related Questions