Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist

I am trying to run a spark session in the Jupyter Notebook on a EC2 Linux machine via Visual Studio Code. My code looks as following:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("spark_app").getOrCreate()

the error is:

{
    "name": "Py4JError",
    "message": "An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:\npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\n",
    "stack": "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mPy4JError\u001b[0m                                 Traceback (most recent call last)\n\u001b[1;32mc:\\Users\\IrinaKaerkkaenen\\Projekte\\ZugPortal\\test.ipynb Cell 3'\u001b[0m in \u001b[0;36m<cell line: 2>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=0'>1</a>\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mpyspark\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39msql\u001b[39;00m \u001b[39mimport\u001b[39;00m SparkSession\n\u001b[0;32m----> <a href='vscode-notebook-cell:/c%3A/Users/IrinaKaerkkaenen/Projekte/ZugPortal/test.ipynb#ch0000002?line=1'>2</a>\u001b[0m spark \u001b[39m=\u001b[39m SparkSession\u001b[39m.\u001b[39;49mbuilder\u001b[39m.\u001b[39;49mappName(\u001b[39m\"\u001b[39;49m\u001b[39mspark_app\u001b[39;49m\u001b[39m\"\u001b[39;49m)\u001b[39m.\u001b[39;49mgetOrCreate()\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:272\u001b[0m, in \u001b[0;36mSparkSession.Builder.getOrCreate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    269\u001b[0m     sc \u001b[39m=\u001b[39m SparkContext\u001b[39m.\u001b[39mgetOrCreate(sparkConf)\n\u001b[1;32m    270\u001b[0m     \u001b[39m# Do not update `SparkConf` for existing `SparkContext`, as it's shared\u001b[39;00m\n\u001b[1;32m    271\u001b[0m     \u001b[39m# by all sessions.\u001b[39;00m\n\u001b[0;32m--> 272\u001b[0m     session \u001b[39m=\u001b[39m SparkSession(sc, options\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_options)\n\u001b[1;32m    273\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    274\u001b[0m     \u001b[39mgetattr\u001b[39m(\n\u001b[1;32m    275\u001b[0m         \u001b[39mgetattr\u001b[39m(session\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m    276\u001b[0m     )\u001b[39m.\u001b[39mapplyModifiableSettings(session\u001b[39m.\u001b[39m_jsparkSession, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_options)\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/pyspark/sql/session.py:307\u001b[0m, in \u001b[0;36mSparkSession.__init__\u001b[0;34m(self, sparkContext, jsparkSession, options)\u001b[0m\n\u001b[1;32m    303\u001b[0m         \u001b[39mgetattr\u001b[39m(\u001b[39mgetattr\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mapplyModifiableSettings(\n\u001b[1;32m    304\u001b[0m             jsparkSession, options\n\u001b[1;32m    305\u001b[0m         )\n\u001b[1;32m    306\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 307\u001b[0m         jsparkSession \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_jvm\u001b[39m.\u001b[39;49mSparkSession(\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_jsc\u001b[39m.\u001b[39;49msc(), options)\n\u001b[1;32m    308\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    309\u001b[0m     \u001b[39mgetattr\u001b[39m(\u001b[39mgetattr\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_jvm, \u001b[39m\"\u001b[39m\u001b[39mSparkSession$\u001b[39m\u001b[39m\"\u001b[39m), \u001b[39m\"\u001b[39m\u001b[39mMODULE$\u001b[39m\u001b[39m\"\u001b[39m)\u001b[39m.\u001b[39mapplyModifiableSettings(\n\u001b[1;32m    310\u001b[0m         jsparkSession, options\n\u001b[1;32m    311\u001b[0m     )\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/java_gateway.py:1585\u001b[0m, in \u001b[0;36mJavaClass.__call__\u001b[0;34m(self, *args)\u001b[0m\n\u001b[1;32m   1579\u001b[0m command \u001b[39m=\u001b[39m proto\u001b[39m.\u001b[39mCONSTRUCTOR_COMMAND_NAME \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1580\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_command_header \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1581\u001b[0m     args_command \u001b[39m+\u001b[39m\\\n\u001b[1;32m   1582\u001b[0m     proto\u001b[39m.\u001b[39mEND_COMMAND_PART\n\u001b[1;32m   1584\u001b[0m answer \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_gateway_client\u001b[39m.\u001b[39msend_command(command)\n\u001b[0;32m-> 1585\u001b[0m return_value \u001b[39m=\u001b[39m get_return_value(\n\u001b[1;32m   1586\u001b[0m     answer, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_gateway_client, \u001b[39mNone\u001b[39;49;00m, \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_fqn)\n\u001b[1;32m   1588\u001b[0m \u001b[39mfor\u001b[39;00m temp_arg \u001b[39min\u001b[39;00m temp_args:\n\u001b[1;32m   1589\u001b[0m     temp_arg\u001b[39m.\u001b[39m_detach()\n\nFile \u001b[0;32m~/anaconda3/lib/python3.9/site-packages/py4j/protocol.py:330\u001b[0m, in \u001b[0;36mget_return_value\u001b[0;34m(answer, gateway_client, target_id, name)\u001b[0m\n\u001b[1;32m    326\u001b[0m         \u001b[39mraise\u001b[39;00m Py4JJavaError(\n\u001b[1;32m    327\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    328\u001b[0m             \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name), value)\n\u001b[1;32m    329\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 330\u001b[0m         \u001b[39mraise\u001b[39;00m Py4JError(\n\u001b[1;32m    331\u001b[0m             \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m. Trace:\u001b[39m\u001b[39m\\n\u001b[39;00m\u001b[39m{3}\u001b[39;00m\u001b[39m\\n\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    332\u001b[0m             \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name, value))\n\u001b[1;32m    333\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m    334\u001b[0m     \u001b[39mraise\u001b[39;00m Py4JError(\n\u001b[1;32m    335\u001b[0m         \u001b[39m\"\u001b[39m\u001b[39mAn error occurred while calling \u001b[39m\u001b[39m{0}\u001b[39;00m\u001b[39m{1}\u001b[39;00m\u001b[39m{2}\u001b[39;00m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\n\u001b[1;32m    336\u001b[0m         \u001b[39mformat\u001b[39m(target_id, \u001b[39m\"\u001b[39m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m, name))\n\n\u001b[0;31mPy4JError\u001b[0m: An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:\npy4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)\n\tat py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)\n\tat py4j.Gateway.invoke(Gateway.java:237)\n\tat py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\n"
}

The output of running the cell before I read the full error in text editor is the following

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
/tmp/ipykernel_5260/8684085.py in <module>
      1 from pyspark.sql import SparkSession
----> 2 spark = SparkSession.builder.appName("spark_app").getOrCreate()

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in getOrCreate(self)
    270                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    271                     # by all sessions.
--> 272                     session = SparkSession(sc, options=self._options)
    273                 else:
    274                     getattr(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/pyspark/sql/session.py in __init__(self, sparkContext, jsparkSession, options)
    305                 )
    306             else:
--> 307                 jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options)
    308         else:
    309             getattr(getattr(self._jvm, "SparkSession$"), "MODULE$").applyModifiableSettings(

~/anaconda3/envs/zupo_env_test1/lib64/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1584         answer = self._gateway_client.send_command(command)
   1585         return_value = get_return_value(
-> 1586             answer, self._gateway_client, None, self._fqn)
   1587 
...
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)

I have googled a lot without a success. Does anybody has an idea what is wrong?

I use IPython Kernel with 3.9 Python installed.

The warnings before the error comes:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/ec2-user/spark/spark-3.1.2-bin-hadoop2.7/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/07/05 21:06:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Upvotes: 21

Answers (3)

Neelesh Kumar Asati

Reputation: 1

pip install pyspark==3.2.1

#It worked for me also

Upvotes: 0

Mostafa Ghadimi

Reputation: 6766

It appears that the version of spark installed in your machine does not match the version of pyspark.

Check the version of your spark using the following command:

<path-to-spark-bin>/spark-submit --version
# example output
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.3
      /_/

Now as it appears from the sample output, the version of installed Spark version is 3.1.3, so you need to install the python library of spark (pyspark) with the same version by executing the following command:

pip install pyspark==<the-version-of-your-spark>
# Example
pip install pyspark==3.1.3

Upvotes: 12

Luis Hassiel Figueroa

Reputation: 301

I had the same problem, I've fixed it installing the same version of pyspark from pip and spark. you should check if your installed versions are the same.

Upvotes: 30

Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist

Answers (3)

Related Questions