Naveen Srikanth
Naveen Srikanth

Reputation: 789

pyspark comapre only time from timestamp

I am using pyspark 3.x version.

I am finding difficult to code for below scenario

I have below column ()as timestamp in pyspark

time_val


2021-08-02 03:50:00

2021-08-01 10:05:00

My output should be as below

time_val                     flg


2021-08-02 03:50:00           flg_1

2021-08-01 10:05:00           flg_2

Conditions that should be applied are below

  Irrespective of any days like given above as 01/02 ,if time_val is between 3 45 am to 10 am then flg_1 else flg_2

Struggling to provide apply the time conditions in pyspark.

Any help highly appreciated

Upvotes: 0

Views: 340

Answers (1)

ggordon
ggordon

Reputation: 10035

You may achieve this with the assistance of the hour, minute functions. You could extract the hour and minutes to get a scalar value which you may then compare to 3:45 and 10am.

Approach 1: Using Spark SQL

  1. Ensure your dataframe is accessible
df.createOrReplaceTempView("my_data")
  1. Execute on your spark session
my_results = sparkSession.sql("""
SELECT 
    date_val, 
    CASE 
        WHEN (hour(date_val)*60*60 + minute(date_val)*60 ) BETWEEN 
        3*60*60+45*60 AND 10*60*60
        THEN 'flg_1'
        ELSE 'flg_2'
    END flg 
FROM my_data
""")

Approach 2: Using Python

from pyspark.sql import functions as F

my_results = df.withColumn('flg',F.hour("date_val")*60*60 + F.minute("date_val")*60 )
my_results = my_results.withColumn('flg',F.when(F.col('flg').between(3*60*60+45*60,10*60*60),'flg_1').otherwise('flg_2'))

full reproducible example using spark sql

WITH my_data AS (
    SELECT CAST('2021-08-02 03:50:00' as TIMESTAMP) date_val UNION ALL
    SELECT CAST('2021-08-01 10:05:00' as TIMESTAMP) date_val
)
SELECT 
    date_val, 
    CASE 
        WHEN (hour(date_val)*60*60 + minute(date_val)*60 ) BETWEEN 
        3*60*60+45*60 AND 10*60*60
        THEN 'flg_1'
        ELSE 'flg_2'
    END flg
FROM my_data


Upvotes: 1

Related Questions