Rocky1989
Rocky1989

Reputation: 399

Another way of passing orderby list in pyspark windows method

I just had a below concern in performing window operation on pyspark dataframe. I want to get the latest records from the input table with the below condition, but I want to exclude the for loop:

groupby_col = ["col('customer_id')"]
orderby_col = ["col('process_date').desc()", "col('load_date').desc()"]

window_spec = Window.partitionBy(*groupby_col).orderBy([eval(x) for x in orderby_col])

df = df.withColumn("rank", rank().over(window_spec))
df = df.filter(col('rank') == '1')

My concern, is I'm using the orderby_col and evaluating to covert in columner way using eval() and for loop to check all the orderby columns in the list. Could you please let me know how we can pass multiple columns in order by without having a for loop to do the descending order??

Upvotes: 0

Views: 462

Answers (1)

Marcin Szczepański
Marcin Szczepański

Reputation: 31

import pyspark.sql.functions as f
groupby_col = ["col('customer_id')"]
orderby_col = ["col('process_date')", "col('load_date')"]

window_spec = Window.partitionBy(*groupby_col).orderBy(f.desc(*orderby_col))

df = df.withColumn("rank", f.rank().over(window_spec))
df = df.filter(col('rank') == '1')

Upvotes: 1

Related Questions