Reputation: 245

Invalid Spark URL in local spark session

since updating to Spark 2.3.0, tests which are run in my CI (Semaphore) fail due to a allegedly invalid spark url when creating the (local) spark context:

18/03/07 03:07:11 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610
    at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
    at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
    at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:32)
    at org.apache.spark.executor.Executor.<init>(Executor.scala:155)
    at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:59)
    at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)

The spark session is created as following:

val sparkSession: SparkSession = SparkSession
.builder
.appName(s"LocalTestSparkSession")
.config("spark.broadcast.compress", "false")
.config("spark.shuffle.compress", "false")
.config("spark.shuffle.spill.compress", "false")
.master("local[3]")
.getOrCreate

Before updating to Spark 2.3.0, no problems were encountered in version 2.2.1 and 2.1.0. Also, running the tests locally works fine.

Upvotes: 18

Answers (9)

deepb1ue

Reputation: 31

Setting .config("spark.driver.host", "localhost") fixed the issue for me.

        SparkSession spark = SparkSession
            .builder()
            .config("spark.master", "local")
            .config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
            .config("spark.hadoop.fs.s3a.buffer.dir", "/tmp")
            .config("spark.driver.memory", "2048m")
            .config("spark.executor.memory", "2048m")
            .config("spark.driver.bindAddress", "127.0.0.1")
            .config("spark.driver.host", "localhost")
            .getOrCreate();

Upvotes: 3

Felipe Zschornack

Reputation: 141

If you don't want to change the environment variable, you can change the code to add the config in the SparkSession builder (like Hanisha said above).

In PySpark:

spark = SparkSession.builder.config("spark.driver.host", "localhost").getOrCreate()

Upvotes: 11

Rajitha Fernando

Reputation: 1865

As mentioned in above answers, You need to change SPARK_LOCAL_HOSTNAME to localhost. In windows, you have to use SET command, SET SPARK_LOCAL_HOSTNAME=localhost

but this SET command is temporary. you may have to run it again and again in every new terminal. but instead, you can use SETX command, which is permanent.

SETX SPARK_LOCAL_HOSTNAME localhost

You can type above command in any place. just open a command prompt and run above command. Notice that unlike SET command, SETX command do not allow equation mark. you need to separate environment variable and the value by a Space.

if Success, you will see a message like "SUCCESS: Specified value was saved"

you can also verify that your variable is successfully added by just typing SET in a different command prompt. (or type SET s , which gives variables, starting with the letter 'S'). you can see that SPARK_LOCAL_HOSTNAME=localhost in results, which will not happen if you use SET command instead of SETX

Upvotes: 4

AaronDT

Reputation: 4050

For anyone working in Jupyter Notebook. Adding %env SPARK_LOCAL_HOSTNAME=localhost to the very beginning of the cell solved it for me. Like so:

%env SPARK_LOCAL_HOSTNAME=localhost

import findspark
findspark.init()

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Test")
sc = SparkContext(conf = conf)

Upvotes: 1

RaphaëlR

Reputation: 542

I would like to complement @Prakash Annadurai answer by saying:

If you want to make the variable settlement last after exiting the terminal, add it to your shell profile (e.g. ~/.bash_profile) with the same command:

export SPARK_LOCAL_HOSTNAME=localhost

Upvotes: 0

user3008410

Reputation: 848

Change your hostname to have NO underscore.

spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610 to spark://HeartbeatReceiver@LXCtrusty1802d57a40eb:44610

Ubuntu AS root

#hostnamectl status
#hostnamectl --static set-hostname LXCtrusty1802d57a40eb

#nano /etc/hosts
    127.0.0.1   LXCtrusty1802d57a40eb
#reboot

Upvotes: 1

Nagireddy Hanisha

Reputation: 1440

This has been resolved by setting sparkSession config "spark.driver.host" to the IP address.

It seems that this change is required from 2.3 onwards.

Upvotes: 11

YohanT

Reputation: 82

Try to run Spark locally, with as many worker threads as logical cores on your machine :

.master("local[*]")

Upvotes: 0

Prakash Annadurai

Reputation: 327

Change the SPARK_LOCAL_HOSTNAME to localhost and try.

export SPARK_LOCAL_HOSTNAME=localhost

Upvotes: 28

Invalid Spark URL in local spark session

Answers (9)

Related Questions