Reputation: 5498
Say, I have a dataframe as below
mid | bid | m_date1 | m_date2 | m_date3 |
100 | ws | | | 2022-02-01|
200 | gs | 2022-02-01| | |
Now I have an sql aggregation as below
SELECT
mid,
bid,
min(NEXT(m_date1, 'SAT')) as dat1,
min(NEXT(m_date2, 'SAT')) as dat2,
min(NEXT(m_date3, 'SAT')) as dat3
FROM df
GROUPBY 1,2
I am looking to implement above aggregation using Pyspark but wondering if I can use any form of iteration to achieve dat1, dat2 and dat3 as same 'min' function is applied on those columns. I could use below aggregation syntax in PySpark for each column but I am looking to avoid repeating the 'min' function on each aggregated column.
df.groupBy('mid','bid').agg(...)
Thank you
Upvotes: 1
Views: 232
Reputation: 26676
A sample output would have been better. If I got you right you are after
df.groupby('mid','bid').agg(*[min(i).alias(f"min{i}") for i in df.drop('mid','bid').columns]).show()
Upvotes: 1