Convert datetime to date on PySpark

Question

I have a dataframe with two columns "date" (dtype: string) and "modified" (dtype: bigint) as shown below:

+-------------------------------------+-------------+
|                                 date|     modified|
+-------------------------------------+-------------+
|Mon, 18 Dec 2017 22:52:37 +0000 (UTC)|1513637587000|
|      Mon, 18 Dec 2017 22:52:23 +0000|1513637587000|
|      Mon, 18 Dec 2017 22:52:03 +0000|1513637587000|
|Mon, 18 Dec 2017 22:51:43 +0000 (UTC)|1513637527000|
|      Mon, 18 Dec 2017 22:51:31 +0000|1513637527000|
|      Mon, 18 Dec 2017 22:51:38 +0000|1513637527000|
|      Mon, 18 Dec 2017 22:51:09 +0000|1513637526000|
|      Mon, 18 Dec 2017 22:50:55 +0000|1513637466000|
|      Mon, 18 Dec 2017 22:50:35 +0000|1513637466000|
|      Mon, 18 Dec 2017 17:49:35 -0500|1513637407000|
+-------------------------------------+-------------+

How to extract YYYY-mm-dd (2017-12-18) from any of the two columns? I tried using unix_timestamp, and to_timestamp but nothing worked. It gives null value.

mck · Accepted Answer

You can use from_unixtime to cast the bigint unix timestamp into a timestamp type, and then cast to date type:

import pyspark.sql.functions as F

df2 = df.withColumn('parsed_date', F.from_unixtime(F.col('modified')/1000).cast('date'))

df2.show()
+--------------------+-------------+-----------+
|                date|     modified|parsed_date|
+--------------------+-------------+-----------+
|Mon, 18 Dec 2017 ...|1513637587000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637587000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637587000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637527000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637527000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637527000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637526000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637466000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637466000| 2017-12-18|
|Mon, 18 Dec 2017 ...|1513637407000| 2017-12-18|
+--------------------+-------------+-----------+

Convert datetime to date on PySpark

Answers (2)

Related Questions