user7569898
user7569898

Reputation:

Custom OrderBy in Spark SQL

I have two columns that need to be sorted in a custom way.

For Eg: Month Column it should be sorted in a way that Jan2015 to Dec(CurrentYear)

and Also suppose I have Column as Quarter and I want it to or Order by as Q1-2015,Q2-2015,... Q4-CurrentYear ..

in orderby of Spark Sql I'll be giving as orderBy("Month","Quarter") but the Order should be Custom Sequence As before .

I have tried the code below:

import org.apache.spark.sql.SaveMode 
import org.apache.spark.storage.StorageLevel
val vDF=spark.sql(""" select month,quarter from table group by month,quarter order by month,quarter """);
vDF.repartition(10).orderBy("Month","Quarter").write(results‌​.csv);

As of now the Month gets Ordered as Apr,Aug,Dec.... in a alphabetical way and Quarter as Q1-2015,Q1-2016,.... but the requirement is the mentioned above

SAMPLE

Upvotes: 2

Views: 1303

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35229

I'd just parse the dates:

import org.apache.spark.sql.functions._

val df = Seq(
  ("Jul", 2017"), ("May", "Q2-2017"),
  ("Jan", "Q1-2016"), ("Dec", "Q4-2016"), ("Aug", "Q1-2016")
 ).toDF("month", "quarter")

df.orderBy(unix_timestamp(
  concat_ws(" ", col("month"), substring(col("quarter"), 4, 6)), "MMM yyyy"
)).show()

+-----+-------+                                                                 
|month|quarter|
+-----+-------+
|  Jan|Q1-2016|
|  Aug|Q1-2016|
|  Dec|Q4-2016|
|  May|Q2-2017|
|  Jul|Q3-2017|
+-----+-------+

Upvotes: 2

Related Questions