Reputation: 789
I am using pyspark 3.x version.
I am finding difficult to code for below scenario
I have below column ()as timestamp in pyspark
time_val
2021-08-02 03:50:00
2021-08-01 10:05:00
My output should be as below
time_val flg
2021-08-02 03:50:00 flg_1
2021-08-01 10:05:00 flg_2
Conditions that should be applied are below
Irrespective of any days like given above as 01/02 ,if time_val is between 3 45 am to 10 am then flg_1 else flg_2
Struggling to provide apply the time conditions in pyspark.
Any help highly appreciated
Upvotes: 0
Views: 340
Reputation: 10035
You may achieve this with the assistance of the hour
, minute
functions. You could extract the hour and minutes to get a scalar value which you may then compare to 3:45 and 10am.
df.createOrReplaceTempView("my_data")
my_results = sparkSession.sql("""
SELECT
date_val,
CASE
WHEN (hour(date_val)*60*60 + minute(date_val)*60 ) BETWEEN
3*60*60+45*60 AND 10*60*60
THEN 'flg_1'
ELSE 'flg_2'
END flg
FROM my_data
""")
from pyspark.sql import functions as F
my_results = df.withColumn('flg',F.hour("date_val")*60*60 + F.minute("date_val")*60 )
my_results = my_results.withColumn('flg',F.when(F.col('flg').between(3*60*60+45*60,10*60*60),'flg_1').otherwise('flg_2'))
full reproducible example using spark sql
WITH my_data AS (
SELECT CAST('2021-08-02 03:50:00' as TIMESTAMP) date_val UNION ALL
SELECT CAST('2021-08-01 10:05:00' as TIMESTAMP) date_val
)
SELECT
date_val,
CASE
WHEN (hour(date_val)*60*60 + minute(date_val)*60 ) BETWEEN
3*60*60+45*60 AND 10*60*60
THEN 'flg_1'
ELSE 'flg_2'
END flg
FROM my_data
Upvotes: 1