Reputation:
I have two dataframe :
Dataframe 1
+-----------------+-----------------+
| hour_Entre | hour_Sortie |
+-----------------+-----------------+
| 18:30:00 | 05:00:00 |
| | |
+-----------------+-----------------+
Dataframe 2
+-----------------+
| hour_Tracking |
+-----------------+
| 19:30:00 |
+-----------------+
I want to take the hour_tracking that are between hour_Entre and hour_Sortie.
I tried the following code :
boolean checked = true;
try{
if(df1.select(col("heureSortie")) != null && df1.select(col("heureEntre")) !=null){
checked = checked && df2.select(col("dateTracking_hour_minute").between(df1.select(col("heureSortie")),df1.select(col("heureEntre"))));
}
} catch (Exception e) {
e.printStackTrace();
}
But I get this error :
Operator && cannot be applied to boolean , 'org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>'
Upvotes: 0
Views: 188
Reputation: 2011
In case you are looking for hour difference -
1st create date difference
from pyspark.sql import functions as F
df = df.withColumn('date_diff', F.datediff(F.to_date(df.hour_Entre), F.to_date(df.hour_Sortie)))
Then calculate hour difference out of that -
df = df.withColumn('hours_diff', (df.date_diff*24) +
F.hour(df.hour_Entre) - F.hour(df.hour_Sortie))
Upvotes: 1