SheCodes
SheCodes

Reputation: 595

Filter Dataframe based on Timestamp column

My requirement is to filter dataframe based on timestamp column such that data which are only 10 minutes old. Dataframe looks like:

ID,timestamp,value
ID-1,8/23/2017 6:11:13,4.56
ID-2,8/23/2017 6:5:21,5.92
ID-3,8/23/2017 5:49:13,6.00

I am trying the following code but not getting the desired output

df.filter(to_date(df("timestamp")).gt(lit(now.get(Calendar.MINUTE)-10)))

Any help will be appreciated.

Upvotes: 2

Views: 7239

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

Given the dataframe as

+----+------------------+-----+
|ID  |timestamp         |value|
+----+------------------+-----+
|ID-1|8/23/2017 14:48:13|4.56 |
|ID-2|8/23/2017 6:5:21  |5.92 |
|ID-3|8/23/2017 5:49:13 |6.0  |
+----+------------------+-----+ 

and current time as

2017-08-23 14:53:33

You can do the following

import org.apache.spark.sql.functions._
df.withColumn("timestamp", unix_timestamp($"timestamp", "MM/dd/yyyy HH:mm:ss"))
      .filter((unix_timestamp(current_timestamp()) - $"timestamp")/60 < 10)
    .select($"ID", $"timestamp".cast(TimestampType), $"value")

where unix_timestamp will convert the DateTimeStamp to long and current_timestamp() will give the current date and time

You should have final dataframe as

+----+---------------------+-----+
|ID  |timestamp            |value|
+----+---------------------+-----+
|ID-1|2017-08-23 14:48:13.0|4.56 |
+----+---------------------+-----+

I hope the answer is helpful

Upvotes: 4

Related Questions