Reputation:
I have two columns that need to be sorted in a custom way.
For Eg:
Month
Column it should be sorted in a way that Jan2015
to Dec(CurrentYear)
and Also suppose I have Column as Quarter and I want it to or Order by as Q1-2015
,Q2-2015
,... Q4-CurrentYear
..
in orderby of Spark Sql I'll be giving as orderBy("Month","Quarter")
but the Order should be Custom Sequence As before .
I have tried the code below:
import org.apache.spark.sql.SaveMode
import org.apache.spark.storage.StorageLevel
val vDF=spark.sql(""" select month,quarter from table group by month,quarter order by month,quarter """);
vDF.repartition(10).orderBy("Month","Quarter").write(results.csv);
As of now the Month gets Ordered as Apr,Aug,Dec.... in a alphabetical way and Quarter as Q1-2015,Q1-2016,.... but the requirement is the mentioned above
Upvotes: 2
Views: 1303
Reputation: 35229
I'd just parse the dates:
import org.apache.spark.sql.functions._
val df = Seq(
("Jul", 2017"), ("May", "Q2-2017"),
("Jan", "Q1-2016"), ("Dec", "Q4-2016"), ("Aug", "Q1-2016")
).toDF("month", "quarter")
df.orderBy(unix_timestamp(
concat_ws(" ", col("month"), substring(col("quarter"), 4, 6)), "MMM yyyy"
)).show()
+-----+-------+
|month|quarter|
+-----+-------+
| Jan|Q1-2016|
| Aug|Q1-2016|
| Dec|Q4-2016|
| May|Q2-2017|
| Jul|Q3-2017|
+-----+-------+
Upvotes: 2