Reputation: 165

Is there any way to get max value from a column in Pyspark other than collect()?

I want to get the maximum value from a date type column in a pyspark dataframe. Currently, I am using a command like this:

df.select('col1').distinct().orderBy('col1').collect()[0]['col1']

Here "col1" is the datetime type column. It works fine but I want to avoid the use of collect() here as i am doubtful that my driver may get overflowed.

Any advice would be helpful.

Upvotes: 4

Answers (3)

Alejandro Piñon

Reputation: 21

The simplest and cleanest:

max_val = df.selectExpr("MAX(col1)").collect()[0][0]

Upvotes: 0

michalkow112

Reputation: 1

much more shorter: maxValue = df.select(func.max(df.col)).collect()[0][0]

Upvotes: 0

ernest_k

Reputation: 45339

No need to sort, you can just select the maximum:

res = df.select(max(col('col1')).alias('max_col1')).first().max_col1

Or you can use selectExpr

res = df1.selectExpr('max(diff) as max_col1').first().max_col1

Upvotes: 6

Is there any way to get max value from a column in Pyspark other than collect()?

Answers (3)

Related Questions