Reputation: 560
I have to filter records in dataframe with all records greater than a specific timestamp.
My timestamp column in dataframe is in string format.
Name DOJ
---------
Ram 01-Jan-2000 00.00.00.001 AM
Tom 01-Jan-2020 00.00.00.001 AM
Ali 01-Jan-2010 00.00.00.001 AM
df.PrintSchema
Name|--String
DOJ |--String
I need to filter all records which are greater than ("01-Jan-2005 00.00.00.001 AM")
Everything is in String Format
I tried various ways but results are not consistent it seems it comparing string. Probably I am facing issues in converting literal to timestamp and getting variable results.
Upvotes: 0
Views: 2142
Reputation: 1712
you cannot compare time when it is in string. Pyspark offers a api to convert strings to timestamp : https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html?highlight=to_timestamp#pyspark.sql.functions.to_timestamp With this you will be able to convert your column as a dateType and then can do comparisions.
Try it out.
In your case, you are using the 12 hour format (hh), here the allowed values are 1-12 as per SimpleDateFormat that spark uses. Refer https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html
So if you need a 24 hour format change the hour string to HH
Since you are giving 00 to the hh, it is becoming invalid. The below code works . Also careful about the format you give and the string format. if there is a mismatch then you get null.
import pyspark.sql.functions as F
tst=sqlContext.createDataFrame([('Ram','01-Jan-2000 01.00.00.001 AM'),('Tom','01-Jan-2020 11.00.00.001 AM'),('Ali',' 01-Jan-2010 07.00.00.001 AM')],schema=["Name","DOJ"])
tst1=tst.withColumn("ts",F.to_timestamp(F.col("DOJ"),format="dd-MMM-yyyy hh.mm.ss.SSS aa"))
#%%
tst2 =tst1.filter(F.col('ts')>F.to_timestamp(F.lit("01-Jan-2005 01.00.00.001 AM"),format="dd-MMM-yyyy hh.mm.ss.SSS aa"))
results :
|Name|DOJ |ts |
+----+----------------------------+-------------------+
|Tom |01-Jan-2020 11.00.00.001 AM |2020-01-01 11:00:00|
|Ali | 01-Jan-2010 07.00.00.001 AM|2010-01-01 07:00:00|
+----+----------------------------+-------------------+
Upvotes: 1