Reputation: 871
I'm trying to figure out how to get the Max
date from a column DateCol
, that is less than a variable run_date
.
For example from this input table below, I want to compare the DateCol
with a variable runtime
. Suppose run_time= '2022-03-05'
, I'd like to select the third row, as that's where the Max value is, and DateCol<='2022-03-05'
How can this be done? Many thanks.
+---+---+------+
| ID| DateCol |
+---+---+-------
|1. |'2022-03-01'|
|2 |'2022-03-03'|
|3. |'2022-03-04'|
|4. |'2022-03-06'|
+---+------------+
Upvotes: 0
Views: 166
Reputation: 6082
Just do a filter based on run_time
then sort by date and get the max record.
run_time= '2022-03-05'
(df
.where(F.col('DateCol') <= run_time)
.orderBy(F.desc('DateCol'))
.limit(1)
.show()
)
+---+----------+
| ID| DateCol|
+---+----------+
| 3|2022-03-04|
+---+----------+
Upvotes: 1