Reputation: 95
I created an exe file for my pySpark script using pyinstaller. But when i run the exe, I am facing "No module named 'py4j.java_collections'" error.
pyinstaller version - 3.5
python version - Python 3.7.1
Spark version - 2.4.3
OS - Windows 10
I am creating the exe using pyinstaller -F myscript.py command to create the exe. I looked up on other threads and added below path variables in Windows Environment Variables
PYTHONPATH = $SPARK_HOME/python/:$PYTHONPATH;$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
SPARK_HOME = C:\Users\\Desktop\spark\spark-2.4.3-bin-hadoop2.7
My script is having wx components as well for input and message dialog box.
import glob
from pyspark.sql import SparkSession
#Spark Session
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.config("spark.sql.session.timeZone", "UTC") \
.config("spark.sql.execution.pandas.respectSessionTimeZone", "False") \
.getOrCreate()
try:
app = wx.App()
app.MainLoop()
fileLocationBox = wx.TextEntryDialog(None, 'Enter File Directory :', 'Convert To CSV', 'Enter Parquet File Location')
if fileLocationBox.ShowModal() == wx.ID_OK:
fileLoc = fileLocationBox.GetValue()
fileLocationBox.Destroy()
files = [f for f in glob.glob(fileLoc + "**/*.parquet", recursive=True)]
t = spark.read.load(files)
t.coalesce(1).write.csv(fileLoc+'\\OutputCSV', header=True)
dlg = wx.MessageDialog(None, "Output CSV generated", "Output", wx.OK)
if dlg.ShowModal() == wx.ID_OK:
dlg.Destroy()
except Exception as e:
print ("ERROR:")
print (type(e)) # the exception instance
print (e.args)
I have run out of options. Please help on this issue.
Upvotes: 6
Views: 945
Reputation: 1292
For someone who is still trying to figure out the solution.
I ran into exact same issue. On inspecting the py4j/java_gateway.py
file, there is an import statement __import__("py4j.java_collections")
because of which pyinstaller is unable to detect that import by default while building the executable and hence module py4j.java_collections
is not found.
We can add this using the --hidden-import
while building and error will go away.
cmd in my case looked like:
pyinstaller \
--paths /root/anaconda3/envs/dataset_sm_temp/lib/python3.8/site-packages \
--hidden-import=py4j.java_collections \
similarity_util.py
Upvotes: 1