KS17
KS17

Reputation: 95

No module named 'py4j.java_collections' when running exe created using pyinstaller

I created an exe file for my pySpark script using pyinstaller. But when i run the exe, I am facing "No module named 'py4j.java_collections'" error.

pyinstaller version - 3.5

python version - Python 3.7.1

Spark version - 2.4.3

OS - Windows 10

I am creating the exe using pyinstaller -F myscript.py command to create the exe. I looked up on other threads and added below path variables in Windows Environment Variables

PYTHONPATH = $SPARK_HOME/python/:$PYTHONPATH;$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH

SPARK_HOME = C:\Users\\Desktop\spark\spark-2.4.3-bin-hadoop2.7

My script is having wx components as well for input and message dialog box.

import glob
from pyspark.sql import SparkSession

#Spark Session
spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option", "some-value") \
    .config("spark.sql.session.timeZone", "UTC") \
    .config("spark.sql.execution.pandas.respectSessionTimeZone", "False") \
    .getOrCreate()

try:
    app = wx.App()
    app.MainLoop()

    fileLocationBox = wx.TextEntryDialog(None, 'Enter File Directory :', 'Convert To CSV', 'Enter Parquet File Location')

    if fileLocationBox.ShowModal() == wx.ID_OK:
        fileLoc = fileLocationBox.GetValue()

    fileLocationBox.Destroy()

    files = [f for f in glob.glob(fileLoc + "**/*.parquet", recursive=True)]

    t = spark.read.load(files)

    t.coalesce(1).write.csv(fileLoc+'\\OutputCSV', header=True)

    dlg = wx.MessageDialog(None, "Output CSV generated", "Output", wx.OK)
    if dlg.ShowModal() == wx.ID_OK:
        dlg.Destroy()

except Exception as e:
    print ("ERROR:")
    print (type(e))     # the exception instance
    print (e.args)

I have run out of options. Please help on this issue.

Upvotes: 6

Views: 945

Answers (1)

Rachit Tayal
Rachit Tayal

Reputation: 1292

For someone who is still trying to figure out the solution. I ran into exact same issue. On inspecting the py4j/java_gateway.py file, there is an import statement __import__("py4j.java_collections") because of which pyinstaller is unable to detect that import by default while building the executable and hence module py4j.java_collections is not found.

We can add this using the --hidden-import while building and error will go away. cmd in my case looked like:

pyinstaller  \
--paths /root/anaconda3/envs/dataset_sm_temp/lib/python3.8/site-packages \
--hidden-import=py4j.java_collections \
similarity_util.py

Upvotes: 1

Related Questions