danbar2001
danbar2001

Reputation: 23

How to pass variable arguments to a Spark Dataframe using PySpark?

I am using the Crealytics Spark library to read an Excel Workbook into a Spark Dataframe using a Databricks Python notebook.

Hardcoded like this works fine:

df = spark.read.format("com.crealytics.spark.excel")
     .option("useHeader","true")
     .option("dataAddress","'Sheet1'!")
     .load("/FileStore/tables/Test.xlsx")

I would like to read a dynamic list of options from a table into a PySpark structure (such as list or dict) and pass these to the DataFrame as varargs.

However, it fails even when trying to pass in just one option:

test = {"useHeader":"True"}

df = spark.read.format("com.crealytics.spark.excel")
     .option(*test)
     .option("dataAddress","'Sheet'!")
     .load("/FileStore/tables/Test.xlsx")

TypeError: option() takes exactly 3 arguments (2 given)

Upvotes: 2

Views: 1644

Answers (1)

user11042628
user11042628

Reputation:

Use options not option

options(**options)

Adds input options for the underlying data source.

As you can see from the signature, it takes keyword arguments, hence dictionary unpacking will be a valid way to provide these.

Upvotes: 2

Related Questions