spark-1.5.1 throwing out of memory error for hive 1.2.0 using HiveContext in java code

Question

I have a spark-1.5.1 for HADOOP 2.6 running in stand alone mode on my local machine. I am trying to run a hive query from a sample java application, pointing spark.master to (spark://impetus-i0248u:7077) spark master running on my local machine. Here is the piece of java code:

 SparkConf sparkconf = new SparkConf().set("spark.master", "spark://impetus-i0248u:7077").set("spark.app.name", "sparkhivesqltest")
        .set("spark.cores.max", "2").set("spark.executor.memory", "2g").set("worker_max_heapsize","2g").set("spark.driver.memory", "2g");

 SparkContext sc = new SparkContext(sparkconf);

HiveContext sqlContext = new HiveContext(sc);
DataFrame jdbcDF = sqlContext.sql("select * from bm.rutest");

List employeeFullNameRows = jdbcDF.collectAsList();

HiveContext is getting initialized properly as it is able to establish connection with hive metastore. I am getting exception at jdbcDF.collectAsList();

Here is the error coming when spark tries to submit the job:

Submitting 15/12/10 20:00:42 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at collectAsList at HiveJdbcTest.java:30) 15/12/10 20:00:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/12/10 20:00:42 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.26.52.54, ANY, 2181 bytes) 15/12/10 20:00:42 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.26.52.54, ANY, 2181 bytes)

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-5" Exception in thread "shuffle-server-1" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "shuffle-server-1" Exception in thread "threadDeathWatcher-2-1" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "threadDeathWatcher-2-1"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-6" Exception in thread "qtp1003369013-56" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp1003369013-56"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-21"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.actor.default-dispatcher-17"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.remote.default-remote-dispatcher-23" Exception in thread "shuffle-server-2" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "shuffle-server-2"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkDriver-akka.actor.default-dispatcher-2"

Below is the configuration added in spark-env.sh

SPARK_EXECUTOR_CORES=2
SPARK_EXECUTOR_MEMORY=3G
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2G
SPARK_EXECUTOR_INSTANCES=2
SPARK_WORKER_INSTANCES=1

If I set, spark.master to local[*], it works fine but when I point it to master running on my machine, I get this above mentioned exception. If I try connecting to mysql db, with the same configuration, it works fine.

PS: The table has only single row.

Please help..!

spark-1.5.1 throwing out of memory error for hive 1.2.0 using HiveContext in java code

Answers (1)

Related Questions