Reputation: 1134
I am trying to insert a DataFrame
in na existing Hive partitioned table.
I would like to parameterize by the partition columns but my current approach is not working:
var partitioncolumn="\"deletion_flag\",\"date_feed\""
df.repartition(37).write.
mode(SaveMode.Overwrite).
partitionBy(partitioncolumn).
insertInto("db.table_name")
How can I make this work?
Upvotes: 1
Views: 1738
Reputation: 12804
partitionBy
takes a variable number of arguments (namely, String
s).
def partitionBy(colNames: String*): DataFrameWriter[T]
// ^ this stands for variadic arguments
In Scala, you can pass postfix a sequence with : _*
to pass it as an argument list.
So you could do something like the following:
var partitioncolumn= Seq("deletion_flag", "date_feed")
df.repartition(37).write.
mode(SaveMode.Overwrite).
partitionBy(partitioncolumn: _*).
insertInto("db.table_name")
Passing a sequence as variadic arguments is also described in this Q&A.
Upvotes: 1
Reputation: 31
As partitionBy
is defined with variadic arguments:
def partitionBy(colNames: String*): DataFrameWriter[T]
It should be:
var partitioncolumn= Seq("deletion_flag", "date_feed")
df.repartition(37).write.mode(SaveMode.Overwrite).partitionBy(
partitioncolumn: _*
).insertInto("db.table_name")
where you provide expanded list of column names.
Upvotes: 3