dzoku
dzoku

Reputation: 77

Spark Error when running python script on databricks

I have the following basic script that works fine using pycharm on my machine.

from pyspark.sql import SparkSession

print("START")

spark = SparkSession \
    .Builder() \
    .appName("myapp") \
    .master('local[*, 4]') \
    .getOrCreate()

print(spark)

data = [('James', '', 'Smith', '1991-04-01', 'M', 3000),
        ('Michael', 'Rose', '', '2000-05-19', 'M', 4000),
        ('Robert', '', 'Williams', '1978-09-05', 'M', 4000),
        ('Maria', 'Anne', 'Jones', '1967-12-01', 'F', 4000),
        ('Jen', 'Mary', 'Brown', '1980-02-17', 'F', -1)
        ]

columns = ["firstname", "middlename", "lastname", "dob", "gender", "salary"]
df = spark.createDataFrame(data=data, schema=columns)
print(df)

However when trying to run on a databricks cluster, directly through python script it gives an error.

START Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Workspace/Repos/***********/sdk_test/tests/snippets/spark_tests.py", line 13, in class SparkTests: File "/Workspace/Repos/*******/sdk_test/tests/snippets/spark_tests.py", line 16, in SparkTests sc = SparkContext.getOrCreate() File "/databricks/spark/python/pyspark/context.py", line 400, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/databricks/spark/python/pyspark/context.py", line 147, in init self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer, File "/databricks/spark/python/pyspark/context.py", line 192, in _do_init raise RuntimeError("A master URL must be set in your configuration") RuntimeError: A master URL must be set in your configuration CalledProcessError: Command 'b'cd ../\n\n/databricks/python3/bin/python -m tests.snippets.spark_tests\n# python -m tests.runner --env=qa --runtime_env=databricks --upload=True --package=sdk\n'' returned non-zero exit status 1.

What am I missing?

Upvotes: 2

Views: 998

Answers (2)

Soumyadeep Debnath
Soumyadeep Debnath

Reputation: 1

Your code works absolutely fine in Databricks:

Output Image

I will suggest you to restart your databricks cluster and run it again. And check command line arguments for running script.

Upvotes: 0

Alex Ott
Alex Ott

Reputation: 87259

Remove .master('local[*, 4]') \ from your code. Master is set automatically on Databricks.

Upvotes: 0

Related Questions