Reputation: 5480
I have a data frame in pyspark
. In this data frame I have column called id
that is unique.
Now I want to find the maximum
value of the column id
in the data frame.
I have tried like below
df['id'].max()
But got below error
TypeError: 'Column' object is not callable
Please let me know how to find the maximum
value of a column in data frame
In the answer by @Dadep the link gives the correct answer
Upvotes: 14
Views: 48405
Reputation: 2424
I'm coming from scala, but I do believe that this is also applicable on python.
val max = df.select(max("id")).first()
but you have first import the following :
from pyspark.sql.functions import max
Upvotes: 2
Reputation: 741
You can use the aggregate max as also mentioned in the pyspark documentation link below:
Link : https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=agg
Code:
row1 = df1.agg({"id": "max"}).collect()[0]
Upvotes: 1
Reputation: 11
The following can be used in pyspark:
df.select(max("id")).show()
Upvotes: 1
Reputation: 2788
if you are using pandas .max()
will work :
>>> df2=pd.DataFrame({'A':[1,5,0], 'B':[3, 5, 6]})
>>> df2['A'].max()
5
Else if it's a spark
dataframe:
Best way to get the max value in a Spark dataframe column
Upvotes: 23