John
John

Reputation: 349

Pyspark to filter date in format yyyy/mm/dd

I want to filter my data for Datetime column in the format yyy-mm-dd. However, its string value and there is a timestamp associated with the date. I don't want this timestamp in my column. I am using Pyspark for it.

Format of date- 2021/09/23 09:00:00+00,

Format to be done- 2021-09-23

from pyspark.sql.functions import to_date
df = df_pyspark.withColumn("date_only",to_date(col("DateTime"))) #col name in data is DateTime

The date_only is showing null values. How should I approach here?

Upvotes: 0

Views: 1681

Answers (1)

Steven
Steven

Reputation: 15318

When using the fonction to_date, you need to pass a format string. The format string can be created using the official documentation for simpleDateFormat - avaialble from the spark documentation directly.
In your case, the format is yyyy/MM/dd HH:mm:ssX :

df.withColumn("t", F.to_date("datetime", "yyyy/MM/dd HH:mm:ssX")).show(truncate=False)                                                                                              
+----------------------+----------+
|DateTime              |t         |
+----------------------+----------+
|2021/09/23 09:00:00+00|2021-09-23|
+----------------------+----------+

you can then filter on the date :

df.where(F.to_date("datetime", "yyyy/MM/dd HH:mm:ssX") == "2021-09-23").show()                                                                                                       
+--------------------+                                                          
|            DateTime|
+--------------------+
|2021/09/23 09:00:...|
+--------------------+

Upvotes: 1

Related Questions