Reputation: 63
I am running a very simple script in Databricks:
try:
spark.sql("""
DELETE FROM raw.{} WHERE databasename = '{}'""".format(raw_json, dbsourcename))
print("Deleting for {}".format(raw_json))
except Exception as e:
print("Error deleting from raw.{} error message: {}".format(raw_json,e))
sys.exit("Exiting notebook")
This script is accepting a JSON parameter in the form of:
[{"table_name": "table1"},
{"table_name": "table2"},
{"table_name": "table3"},
{"table_name": "table4"},... ]
This script exists inside a for_loop like so and cycles through each table_name input:
My workflow runs successfully but it seems to not want to wake up the workernodes. Upon checking the metrics:
I have configured my cluster to be memory optimised and it was only after scaling up my driver node it finally was able to run successfully- clearly showing the dependency on the driver and not the workers.
Any ideas on how I can distribute the workload to workers?
Upvotes: 0
Views: 31