Rahul
Rahul

Reputation: 319

Converting pattern of date in spark dataframe

I have a column in spark dataframe of String datatype (with date in yyyy-MM-dd pattern) I want to display the column value in MM/dd/yyyy pattern

My data is

val df = sc.parallelize(Array(
  ("steak", "1990-01-01", "2000-01-01", 150),
  ("steak", "2000-01-02", "2001-01-13", 180),
  ("fish",  "1990-01-01", "2001-01-01", 100)
)).toDF("name", "startDate", "endDate", "price")

df.show()

+-----+----------+----------+-----+
| name| startDate|   endDate|price|
+-----+----------+----------+-----+
|steak|1990-01-01|2000-01-01|  150|
|steak|2000-01-02|2001-01-13|  180|
| fish|1990-01-01|2001-01-01|  100|
+-----+----------+----------+-----+

root
 |-- name: string (nullable = true)
 |-- startDate: string (nullable = true)
 |-- endDate: string (nullable = true)
 |-- price: integer (nullable = false)

I want to show endDate in MM/dd/yyyy pattern. All I am able to do is convert the column to DateType from String

val df2 = df.select($"endDate".cast(DateType).alias("endDate"))

df2.show()

+----------+
|   endDate|
+----------+
|2000-01-01|
|2001-01-13|
|2001-01-01|
+----------+

df2.printSchema()

root
 |-- endDate: date (nullable = true)

I want to show endDate in MM/dd/yyyy pattern. Only reference I found is this which doesn't solve the problem

Upvotes: 5

Views: 16014

Answers (3)

Dataframe/Dataset having a string column with date value in it and we need to change the date format.

For the query asked, date format can be changed as below:

val df1 = df.withColumn("startDate1", date_format(to_date(col("startDate"),"yyyy-MM-dd"),"MM/dd/yyyy" ))

In Spark, the default date format is "yyyy-MM-dd" hence it can be re-written as

val df1 = df.withColumn("startDate1", date_format(col("startDate"),"MM/dd/yyyy" ))

(i) By applying to_date, we are changing the datatype of this column (string) to Date datatype. Also, we are informing to_date that the format in this string column is yyyy-MM-dd so read the column accordingly. (ii) Next, we are applying date_format to achieve the date format we require which is MM/dd/yyyy.

When time component is involved, use to_timestamp instead of to_date. Note that 'MM' represents month and 'mm' represents minutes.

Upvotes: 0

Abhishek Bansal
Abhishek Bansal

Reputation: 1575

Use pyspark.sql.functions.date_format(date, format):

val df2 = df.select(date_format("endDate", "MM/dd/yyyy").alias("endDate"))

Upvotes: 1

Sanchit Grover
Sanchit Grover

Reputation: 1008

You can use date_format function.

  import sqlContext.implicits._
  import org.apache.spark.sql.functions._

  val df = sc.parallelize(Array(
    ("steak", "1990-01-01", "2000-01-01", 150),
    ("steak", "2000-01-02", "2001-01-13", 180),
    ("fish", "1990-01-01", "2001-01-01", 100))).toDF("name", "startDate", "endDate", "price")

  df.show()

  df.select(date_format(col("endDate"), "MM/dd/yyyy")).show

Output :

+-------------------------------+
|date_format(endDate,MM/dd/yyyy)|
+-------------------------------+
|                     01/01/2000|
|                     01/13/2001|
|                     01/01/2001|
+-------------------------------+

Upvotes: 8

Related Questions