user7569898
user7569898

Reputation:

Sorting in Spark SQL for the Month

I have a column as Month with Contents as

(Jan2016,Feb2016,Mar2016,Jun2016)

I'm trying to order it as

df.orderBy("Month")

But the month Col gets ordered as

Feb2016,Jan2016

in the alphabetical order, How can I order it by month?

Upvotes: 1

Views: 1580

Answers (2)

Alper t. Turker
Alper t. Turker

Reputation: 35249

Parsing dates looks like a way to go:

import org.apache.spark.sql.functions.{to_date, month}

df.orderBy(month(to_date($"Month", "MMMyyy")))

Upvotes: 0

Robin
Robin

Reputation: 695

I refer the code of Antot.

    val monthWithIndex = Seq("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec").zipWithIndex.toMap

val monthSim = udf( (mon : String) => {
  monthWithIndex( mon.substring( 0, 3))
})
val df = session.sparkContext.parallelize( Seq("Jan2016","Feb2016","Mar2016","Jun2016")).toDF("Month")
df.withColumn("newMonth", monthSim($"Month")).orderBy("newMonth").drop("newMonth").show

If you wanto order by year and month, you can add the year column by above code.

Upvotes: 1

Related Questions