Elsa Li
Elsa Li

Reputation: 775

How to calculate Max(Date) and Min(Date) for DateType in pyspark dataframe?

The dataframe has a date column in string type '2017-01-01'

It is converted to DateType()

df = df.withColumn('date', col('date_string').cast(DateType()))

I would like to calculate the first day and last day of the column. I tried with the following codes, but they do not work. Can anyone give any suggestions? Thanks!

df.select('date').min()
df.select('date').max()

df.select('date').last_day()
df.select('date').first_day()

Upvotes: 27

Views: 71215

Answers (2)

Ivan M.
Ivan M.

Reputation: 547

Additional way to do it in a line

import pyspark.sql.functions as F

df.agg(F.min("date"), F.max("date")).show()

Upvotes: 13

user9712091
user9712091

Reputation:

Aggregate with min and max:

from pyspark.sql.functions import min, max

df = spark.createDataFrame([
  "2017-01-01", "2018-02-08", "2019-01-03"], "string"
).selectExpr("CAST(value AS date) AS date")

min_date, max_date = df.select(min("date"), max("date")).first()
min_date, max_date
# (datetime.date(2017, 1, 1), datetime.date(2019, 1, 3))

Upvotes: 49

Related Questions