user1050619
user1050619

Reputation: 20856

Spark No module named found

I have a simple spark program and I get the following error -

Error:-

ImportError: No module named add_num

Command used to run :-

./bin/spark-submit /Users/workflow/test_task.py

Code:-

from __future__ import print_function
from pyspark.sql import SparkSession
from add_num import add_two_nos

def map_func(x):    
    print(add_two_nos(5))
    return x*x

def main():
    spark = SparkSession\
        .builder\
        .appName("test-task")\
        .master("local[*]")\
        .getOrCreate()      
    rdd = spark.sparkContext.parallelize([1,2,3,4,5]) # parallelize into 2
    rdd = rdd.map(map_func) # call the image_chunk_func 
    print(rdd.collect())    
    spark.stop()

if __name__ == "__main__":  
    main()

function code:-

def add_two_nos(x):
    return x*x

Upvotes: 3

Views: 6344

Answers (2)

Bhuvan Gupta
Bhuvan Gupta

Reputation: 66

You can specify the .py file form which you wish to import in the code itself by adding a statement sc.addPyFile(Path).
The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.
Then use from add_num import add_two_nos

Upvotes: 3

Shubham Jain
Shubham Jain

Reputation: 422

You need to include a zip containing add_num.py in your spark-submit command.

./bin/spark-submit --py-files sources.zip /Users/workflow/test_task.py 

When submitting a python application to spark, all the source files imported by the main function/file(here test_task.py) should be packed in a egg or zip format and supplied to spark using --py-files option. If the main function needs only one other file, you can supply it directly without zipping it.

./bin/spark-submit --py-files add_num.py /Users/workflow/test_task.py

Above command should also work since there is only one other python source file required.

Upvotes: 0

Related Questions