user4046073
user4046073

Reputation: 871

PySpark- how to select Max value compared to a parameter

I'm trying to figure out how to get the Max date from a column DateCol, that is less than a variable run_date.

For example from this input table below, I want to compare the DateCol with a variable runtime. Suppose run_time= '2022-03-05', I'd like to select the third row, as that's where the Max value is, and DateCol<='2022-03-05' How can this be done? Many thanks.

+---+---+------+
| ID|    DateCol |
+---+---+-------
|1. |'2022-03-01'|
|2  |'2022-03-03'|
|3. |'2022-03-04'|
|4. |'2022-03-06'|
+---+------------+

Upvotes: 0

Views: 166

Answers (1)

pltc
pltc

Reputation: 6082

Just do a filter based on run_time then sort by date and get the max record.

run_time= '2022-03-05'

(df
    .where(F.col('DateCol') <= run_time)
    .orderBy(F.desc('DateCol'))
    .limit(1)
    .show()
)

+---+----------+
| ID|   DateCol|
+---+----------+
|  3|2022-03-04|
+---+----------+

Upvotes: 1

Related Questions