Adika Stadevant
Adika Stadevant

Reputation: 69

select rows to read pyspark dataframe based on a latest date value

I have a table like as shown below since the order numbers reoccur based on a date i would like to read just one of them with the latest date. example is just get A1 for 24/03/2022 on pyspark thanks

This my data table

Upvotes: 2

Views: 2204

Answers (1)

Adika Stadevant
Adika Stadevant

Reputation: 69

w = Window.partitionBy('order').orderBy('date')

df = (df
.withColumn('rank',F.row_number().over(w)))

df = (df
.filter(df['rank'] == 1).drop('rank'))

I solved this by ranking the Orders by date and selecting the one with the lowest rank 1

Upvotes: 4

Related Questions