Shankar
Shankar

Reputation: 8957

How to convert Timestamp to Date format in DataFrame?

I have a DataFrame with Timestamp column, which i need to convert as Date format.

Is there any Spark SQL functions available for this?

Upvotes: 32

Views: 88913

Answers (5)

PrithuBarnwal
PrithuBarnwal

Reputation: 21

Best thing to use..tried and tested -

df_join_result.withColumn('order_date', df_join_result['order_date'].cast('date'))

Upvotes: 0

skybutter
skybutter

Reputation: 116

For Spark 2.4+,

import spark.implicits._
val newDF = df.withColumn("dateColumn", $"timestampColumn".cast(DateType))    

OR

val newDF = df.withColumn("dateColumn", col("timestampColumn").cast(DateType))

Upvotes: 4

dslack
dslack

Reputation: 845

In SparkSQL:

SELECT
  CAST(the_ts AS DATE) AS the_date
FROM the_table

Upvotes: 22

Marsellus Wallace
Marsellus Wallace

Reputation: 18601

Imagine the following input:

val dataIn = spark.createDataFrame(Seq(
        (1, "some data"),
        (2, "more data")))
    .toDF("id", "stuff")
    .withColumn("ts", current_timestamp())

dataIn.printSchema
root
 |-- id: integer (nullable = false)
 |-- stuff: string (nullable = true)
 |-- ts: timestamp (nullable = false)

You can use the to_date function:

val dataOut = dataIn.withColumn("date", to_date($"ts"))

dataOut.printSchema
root
 |-- id: integer (nullable = false)
 |-- stuff: string (nullable = true)
 |-- ts: timestamp (nullable = false)
 |-- date: date (nullable = false)

dataOut.show(false)
+---+---------+-----------------------+----------+
|id |stuff    |ts                     |date      |
+---+---------+-----------------------+----------+
|1  |some data|2017-11-21 16:37:15.828|2017-11-21|
|2  |more data|2017-11-21 16:37:15.828|2017-11-21|
+---+---------+-----------------------+----------+

I would recommend preferring these methods over casting and plain SQL.

Upvotes: 6

Daniel de Paula
Daniel de Paula

Reputation: 17862

You can cast the column to date:

Scala:

import org.apache.spark.sql.types.DateType

val newDF = df.withColumn("dateColumn", df("timestampColumn").cast(DateType))

Pyspark:

df = df.withColumn('dateColumn', df['timestampColumn'].cast('date'))

Upvotes: 70

Related Questions