Reputation: 3599
val partitionsColumns = "idnum,monthnum"
val partitionsColumnsList = partitionsColumns.split(",").toList
val loc = "/data/omega/published/invoice"
val df = sqlContext.read.parquet(loc)
val windowFunction = Window.partitionBy (partitionsColumnsList:_*).orderBy(df("effective_date").desc)
<console>:38: error: overloaded method value partitionBy with alternatives: (cols: org.apache.spark.sql.Column*) org.apache.spark.sql.expressions.WindowSpec <and> (colName: String,colNames: String*)org.apache.spark.sql.expressions.WindowSpec cannot be applied to (String) val windowFunction = Window.partitionBy(partitionsColumnsList:_*).orderBy(df("effective_date").desc)
Is it possible to send List of Columns to partitionBy
method Spark/Scala?
I have implemented for passing one column to partitionBy
method which worked. I don't know how to pass multiple columns to partitionBy
Method
basically I want to pass List(Columns)
to partitionBy
method
Spark version is 1.6.
Upvotes: 2
Views: 6509
Reputation: 10082
Window.partitionBy
has the following definitions:
static WindowSpec partitionBy(Column... cols)
Creates a WindowSpec with the partitioning defined.
static WindowSpec partitionBy(scala.collection.Seq<Column> cols)
Creates a WindowSpec with the partitioning defined.
static WindowSpec partitionBy(String colName, scala.collection.Seq<String> colNames)
Creates a WindowSpec with the partitioning defined.
static WindowSpec partitionBy(String colName, String... colNames)
Creates a WindowSpec with the partitioning defined.
With your example,
val partitionsColumnsList = partitionsColumns.split(",").toList
You can use it like:
Window.partitionBy(partitionsColumnsList.map(col(_)):_*).orderBy(df("effective_date").desc)
Or
Window.partitionBy(partitionsColumnsList.head, partitionsColumnsList.tail _* ).orderBy(df("effective_date").desc)
Upvotes: 5
Reputation: 2998
you could also apply multiple columns for partitionBy by assigning the column names as a list to the variable and use that in the partitionBy argument as below:
val partitioncolumns = List("idnum","monthnum")
val w = Window.partitionBy(partitioncolumns:_*).orderBy(df("effective_date").desc)
Upvotes: 1
Reputation: 61
The code below worked for me:
Window.partitionBy(partitionsColumnsList.map(col(_)):_*).orderBy(df("effective_date").desc)
Upvotes: 0