Reputation: 2083
I have a dataframe: yearDF with the following columns: name, id_number, location, source_system_name, period_year
.
If I want to repartition the dataframe based on a column, I'd do:
yearDF.repartition('source_system_name')
I have a variable: val partition_columns = "source_system_name,period_year"
I tried to do it this way:
val dataDFPart = yearDF.repartition(col(${prtn_String_columns}))
but I get a compilation error: cannot resolve the symbol $
Is there anyway I can repartition the dataframe: yearDF
based on the values in partition_columns
Upvotes: 0
Views: 8878
Reputation: 1553
There are three implementations of the repartition function in Scala / Spark :
def repartition(partitionExprs: Column*): Dataset[T]
def repartition(numPartitions: Int, partitionExprs: Column*): Dataset[T]
def repartition(numPartitions: Int): Dataset[T]
So in order to repartition on multiple columns, you can try to split your field by the comma and use the vararg operator of Scala on it, like this :
val columns = partition_columns.split(",").map(x => col(x))
yearDF.repartition(columns: _*)
Another way to do it, is to call every col one by one :
yearDF.repartition(col("source_system_name"), col("period_year"))
Upvotes: 4