codingnoob
codingnoob

Reputation: 63

How do I wake up the worker nodes when driver node is doing all the work?

I am running a very simple script in Databricks:

try:
    spark.sql("""
            DELETE FROM raw.{} WHERE databasename = '{}'""".format(raw_json, dbsourcename)) 
    print("Deleting for {}".format(raw_json))
except Exception as e:
    print("Error deleting from raw.{} error message: {}".format(raw_json,e))
    sys.exit("Exiting notebook")

This script is accepting a JSON parameter in the form of:

 [{"table_name": "table1"}, 
{"table_name": "table2"}, 
{"table_name": "table3"}, 
{"table_name": "table4"},... ]

This script exists inside a for_loop like so and cycles through each table_name input:

enter image description here

My workflow runs successfully but it seems to not want to wake up the workernodes. Upon checking the metrics: enter image description here

I have configured my cluster to be memory optimised and it was only after scaling up my driver node it finally was able to run successfully- clearly showing the dependency on the driver and not the workers.

Any ideas on how I can distribute the workload to workers?

Upvotes: 0

Views: 31

Answers (0)

Related Questions