Reputation: 2017
I have a spark-1.5.1 for HADOOP 2.6 running in stand alone mode on my local machine. I am trying to run a hive query from a sample java application, pointing spark.master to (spark://impetus-i0248u:7077) spark master running on my local machine. Here is the piece of java code:
SparkConf sparkconf = new SparkConf().set("spark.master", "spark://impetus-i0248u:7077").set("spark.app.name", "sparkhivesqltest")
.set("spark.cores.max", "2").set("spark.executor.memory", "2g").set("worker_max_heapsize","2g").set("spark.driver.memory", "2g");
SparkContext sc = new SparkContext(sparkconf);
HiveContext sqlContext = new HiveContext(sc);
DataFrame jdbcDF = sqlContext.sql("select * from bm.rutest");
List<Row> employeeFullNameRows = jdbcDF.collectAsList();
HiveContext is getting initialized properly as it is able to establish connection with hive metastore. I am getting exception at jdbcDF.collectAsList();
Here is the error coming when spark tries to submit the job:
Submitting 15/12/10 20:00:42 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at collectAsList at HiveJdbcTest.java:30) 15/12/10 20:00:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/12/10 20:00:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.26.52.54, ANY, 2181 bytes) 15/12/10 20:00:42 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.26.52.54, ANY, 2181 bytes)
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-5" Exception in thread "shuffle-server-1" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "shuffle-server-1" Exception in thread "threadDeathWatcher-2-1" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "threadDeathWatcher-2-1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-6" Exception in thread "qtp1003369013-56" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp1003369013-56"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-21"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.actor.default-dispatcher-17"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-23" Exception in thread "shuffle-server-2" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "shuffle-server-2"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.actor.default-dispatcher-2"
Below is the configuration added in spark-env.sh
SPARK_EXECUTOR_CORES=2
SPARK_EXECUTOR_MEMORY=3G
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2G
SPARK_EXECUTOR_INSTANCES=2
SPARK_WORKER_INSTANCES=1
If I set, spark.master to local[*], it works fine but when I point it to master running on my machine, I get this above mentioned exception. If I try connecting to mysql db, with the same configuration, it works fine.
PS: The table has only single row.
Please help..!
Upvotes: 1
Views: 1361
Reputation: 1420
Here is the explanations to the concepts in your question: -
So considering above, as you are using local box, there is quite possibility that local(*) or collectAsList() may not give any OOM but collect() may produce memory exceptions.
Upvotes: 2