Reputation: 165
I want to get the maximum value from a date type column in a pyspark dataframe. Currently, I am using a command like this:
df.select('col1').distinct().orderBy('col1').collect()[0]['col1']
Here "col1"
is the datetime type column. It works fine but I want to avoid the use of collect()
here as i am doubtful that my driver may get overflowed.
Any advice would be helpful.
Upvotes: 4
Views: 12921
Reputation: 21
The simplest and cleanest:
max_val = df.selectExpr("MAX(col1)").collect()[0][0]
Upvotes: 0
Reputation: 1
much more shorter:
maxValue = df.select(func.max(df.col)).collect()[0][0]
Upvotes: 0
Reputation: 45339
No need to sort, you can just select the maximum:
res = df.select(max(col('col1')).alias('max_col1')).first().max_col1
Or you can use selectExpr
res = df1.selectExpr('max(diff) as max_col1').first().max_col1
Upvotes: 6