Pass parameters/arguments to HDInsight/Spark Activity in Azure Data Factory

Question

I have an on-demand HDInsight cluster that is launched from a Spark Activity within Azure Data Factory and runs PySpark 3.1. To test out my code, I normally launch Jupyter Notebook from the created HDInsight Cluster page.

Now, I would like to pass some parameters to that Spark activity and retrieve these parameters from within Jupyter notebook code. I've tried doing so in two ways, but none of them worked for me:

Method A. as Arguments and then tried to retrieve them using sys.argv[].

Method B. as Spark configuration and then tried to retrieve them using sc.getConf().getAll().

I suspect that either:

I am not specifying parameters correctly
or using a wrong way to retrieve them in Jupyter Notebook code
or parameters are only valid for the Python *.py scripts specified in the "File path" field, but not for the Jupyter notebooks.

Any pointers on how to pass parameters into HDInsight Spark activity within Azure Data Factory would be much appreciated.

Pass parameters/arguments to HDInsight/Spark Activity in Azure Data Factory

Answers (1)

Related Questions