Rens
Rens

Reputation: 187

Databricks connect to IntelliJ + python Error Exception in thread "main" java.lang.NoSuchMethodError:

I trying to connect my databricks with my IDE

I do not have spark ad/or scala downloaded on my machine, but I did download pyspark (pip install pyspark). I consturcted the necessary environmental variables and made a folder Hadoop, in which I placed a folder bin, in which I placed a winutils.exe file.

This was a step-wise process in which slowsly but steadily all my errors were solved, except for the last one:

import logging
from pyspark.sql import SparkSession
from pyspark import SparkConf

if __name__ == "__main__":
    spark = SparkSession.builder.getOrCreate()
    spark.sparkContext.setLogLevel("OFF")

Gives

1/03/30 15:14:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception in thread "main" java.lang.NoSuchMethodError: py4j.GatewayServer$GatewayServerBuilder.securityManager(Lpy4j/security/Py4JSecurityManager;)Lpy4j/GatewayServer$GatewayServerBuilder;
    at org.apache.spark.api.python.Py4JServer.<init>(Py4JServer.scala:68)
    at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:37)
    at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

So the first warning is probably due to the fact that I do not have hadoop/spark installed. However, I read that as long as the windows executble winutils.exe is in the bin folder of Hadoop, this should work. (before I had the winutils in that folder, other errors arose, I dealt with those by adding the winutils.exe file) So it is about the Exception in thread 'main' error.

Any idea?

Upvotes: 4

Views: 1410

Answers (1)

Alex Ott
Alex Ott

Reputation: 87249

You need to uninstall PySpark as it's described in documentation. Per documentation:

Having both installed will cause errors when initializing the Spark context in Python. This can manifest in several ways, including “stream corrupted” or “class not found” errors. If you have PySpark installed in your Python environment, ensure it is uninstalled before installing databricks-connect.

so you need to do:

pip uninstall pyspark
pip uninstall databricks-connect
pip install -U databricks-connect==5.5.*  # or X.Y.* to match your cluster version.

Upvotes: 5

Related Questions