user13597413
user13597413

Reputation:

Between function in spark using java

I have two dataframe :

 Dataframe 1
+-----------------+-----------------+
|    hour_Entre   |   hour_Sortie   |
+-----------------+-----------------+
|      18:30:00   |     05:00:00    |
|                 |                 |
+-----------------+-----------------+
 Dataframe 2
+-----------------+
|  hour_Tracking  |            
+-----------------+
|  19:30:00       |
+-----------------+

I want to take the hour_tracking that are between hour_Entre and hour_Sortie.

I tried the following code :

boolean checked = true;
 try{
         if(df1.select(col("heureSortie")) != null && df1.select(col("heureEntre")) !=null){
           checked = checked && df2.select(col("dateTracking_hour_minute").between(df1.select(col("heureSortie")),df1.select(col("heureEntre"))));
         }
      } catch (Exception e) {
          e.printStackTrace();
      }

But I get this error :

Operator && cannot be applied to boolean , 'org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>'

Upvotes: 0

Views: 188

Answers (1)

dsk
dsk

Reputation: 2011

In case you are looking for hour difference -

1st create date difference

from pyspark.sql import functions as F
df = df.withColumn('date_diff', F.datediff(F.to_date(df.hour_Entre), F.to_date(df.hour_Sortie)))

Then calculate hour difference out of that -

df = df.withColumn('hours_diff', (df.date_diff*24) + 
                          F.hour(df.hour_Entre) - F.hour(df.hour_Sortie))

Upvotes: 1

Related Questions