Srinathji Kyadari
Srinathji Kyadari

Reputation: 221

Filter records between particular hours minutes and seconds in Spark data frames

Suppose I have a data frame

+--------------------+---------------+------+ | timestamp| login | Age | +--------------------+---------------+------+ 2016-06-01 01:05:20 | 7372 | 50| 2016-06-01 01:00:20 | 7374 | 35| 2016-06-01 01:10:20 | 7376 | 40|

I want records only between 1 to 1:10 time irrespective of date and

the time is in unix_timestamp as "yyyy-MM-dd HH:mm:ss"

How to extract those records? This is to analyze people who are coming late

Upvotes: 0

Views: 4532

Answers (3)

Srinathji Kyadari
Srinathji Kyadari

Reputation: 221

I achieved it using below code:

val attendenceDF = DF.withColumn("Attendence",when(date_format(DF("timestamp"),"HH:mm:ss").between("01:00:00","01:10:00"),"InTime").otherwise("NotInTime"))

attendenceDF.show()

+--------------------+---------------+------+-----------+ | timestamp| login | Age | Attendence| +--------------------+---------------+------+-----------+ 2016-06-01 01:05:20 | 7372 | 50|InTime | 2016-06-01 01:00:20 | 7374 | 35|InTime | 2016-06-01 01:10:20 | 7376 | 40|NotInTime |

Upvotes: 2

Daniel de Paula
Daniel de Paula

Reputation: 17872

You could try using the functions hour and minute of the functions package:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

val tsCol = col("timestamp").cast(TimestampType)

val filteredDF = df.filter(
  (hour(tsCol) === 1) && (minute(tsCol).between(0, 10))
)

Upvotes: 0

Idan Fischman
Idan Fischman

Reputation: 57

if the timestamp is of type string then with a substring you could do it.

if it is of type unix then you coould convert it, but more efficient is to look in the exact library and format of the type it is saved and check to a way to extract the hour and minute.

hope it helps you :)

Upvotes: -1

Related Questions