Parameterise spark partition by clause

I am trying to insert a DataFrame in na existing Hive partitioned table.

I would like to parameterize by the partition columns but my current approach is not working:

var partitioncolumn="\"deletion_flag\",\"date_feed\""
df.repartition(37).write.
  mode(SaveMode.Overwrite).
  partitionBy(partitioncolumn).
  insertInto("db.table_name")

How can I make this work?

Upvotes: 1

Views: 1738

Answers (2)

stefanobaghino
stefanobaghino

Reputation: 12804

partitionBy takes a variable number of arguments (namely, Strings).

def partitionBy(colNames: String*): DataFrameWriter[T]
//                              ^ this stands for variadic arguments

In Scala, you can pass postfix a sequence with : _* to pass it as an argument list.

So you could do something like the following:

var partitioncolumn= Seq("deletion_flag", "date_feed")
df.repartition(37).write.
  mode(SaveMode.Overwrite).
  partitionBy(partitioncolumn: _*).
  insertInto("db.table_name")

Passing a sequence as variadic arguments is also described in this Q&A.

Upvotes: 1

user9294355
user9294355

Reputation: 31

As partitionBy is defined with variadic arguments:

def partitionBy(colNames: String*): DataFrameWriter[T] 

It should be:

var partitioncolumn= Seq("deletion_flag", "date_feed")
df.repartition(37).write.mode(SaveMode.Overwrite).partitionBy(
   partitioncolumn: _*
).insertInto("db.table_name")

where you provide expanded list of column names.

Upvotes: 3

Related Questions