Reputation: 221
Suppose I have a data frame
+--------------------+---------------+------+
| timestamp| login | Age |
+--------------------+---------------+------+
2016-06-01 01:05:20 | 7372 | 50|
2016-06-01 01:00:20 | 7374 | 35|
2016-06-01 01:10:20 | 7376 | 40|
I want records only between 1 to 1:10 time irrespective of date and
the time is in unix_timestamp as "yyyy-MM-dd HH:mm:ss"
How to extract those records? This is to analyze people who are coming late
Upvotes: 0
Views: 4532
Reputation: 221
I achieved it using below code:
val attendenceDF = DF.withColumn("Attendence",when(date_format(DF("timestamp"),"HH:mm:ss").between("01:00:00","01:10:00"),"InTime").otherwise("NotInTime"))
attendenceDF.show()
+--------------------+---------------+------+-----------+
| timestamp| login | Age | Attendence|
+--------------------+---------------+------+-----------+
2016-06-01 01:05:20 | 7372 | 50|InTime |
2016-06-01 01:00:20 | 7374 | 35|InTime |
2016-06-01 01:10:20 | 7376 | 40|NotInTime |
Upvotes: 2
Reputation: 17872
You could try using the functions hour
and minute
of the functions package:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val tsCol = col("timestamp").cast(TimestampType)
val filteredDF = df.filter(
(hour(tsCol) === 1) && (minute(tsCol).between(0, 10))
)
Upvotes: 0
Reputation: 57
if the timestamp is of type string then with a substring you could do it.
if it is of type unix then you coould convert it, but more efficient is to look in the exact library and format of the type it is saved and check to a way to extract the hour and minute.
hope it helps you :)
Upvotes: -1