Surender Raja
Surender Raja

Reputation: 3599

How do I apply multiple columns in window PartitionBy in Spark scala

val partitionsColumns = "idnum,monthnum"
val partitionsColumnsList = partitionsColumns.split(",").toList
val loc = "/data/omega/published/invoice"
val df = sqlContext.read.parquet(loc)
val windowFunction = Window.partitionBy  (partitionsColumnsList:_*).orderBy(df("effective_date").desc)
<console>:38: error: overloaded method value partitionBy with alternatives:
(cols: org.apache.spark.sql.Column*)     org.apache.spark.sql.expressions.WindowSpec <and>
(colName: String,colNames: String*)org.apache.spark.sql.expressions.WindowSpec
cannot be applied to (String)
val windowFunction = Window.partitionBy(partitionsColumnsList:_*).orderBy(df("effective_date").desc)

Is it possible to send List of Columns to partitionBy method Spark/Scala?

I have implemented for passing one column to partitionBy method which worked. I don't know how to pass multiple columns to partitionBy Method

basically I want to pass List(Columns) to partitionBy method

Spark version is 1.6.

Upvotes: 2

Views: 6509

Answers (3)

philantrovert
philantrovert

Reputation: 10082

Window.partitionBy has the following definitions:

static WindowSpec partitionBy(Column... cols) 

Creates a WindowSpec with the partitioning defined.

static WindowSpec partitionBy(scala.collection.Seq<Column> cols)

Creates a WindowSpec with the partitioning defined.

static WindowSpec partitionBy(String colName, scala.collection.Seq<String> colNames) 

Creates a WindowSpec with the partitioning defined.

static WindowSpec partitionBy(String colName, String... colNames)

Creates a WindowSpec with the partitioning defined.

With your example,

val partitionsColumnsList = partitionsColumns.split(",").toList

You can use it like:

Window.partitionBy(partitionsColumnsList.map(col(_)):_*).orderBy(df("effective_date").desc)

Or

Window.partitionBy(partitionsColumnsList.head, partitionsColumnsList.tail _* ).orderBy(df("effective_date").desc)

Upvotes: 5

Nikunj Kakadiya
Nikunj Kakadiya

Reputation: 2998

you could also apply multiple columns for partitionBy by assigning the column names as a list to the variable and use that in the partitionBy argument as below:

val partitioncolumns = List("idnum","monthnum")
val w = Window.partitionBy(partitioncolumns:_*).orderBy(df("effective_date").desc)

Upvotes: 1

rads
rads

Reputation: 61

The code below worked for me:

Window.partitionBy(partitionsColumnsList.map(col(_)):_*).orderBy(df("effective_date").desc)

Upvotes: 0

Related Questions