How to get the max value of date column in pyspark

Question

I have a spark data frame like below:-

df =
   Name   Date_1     Date_2     Roll.no
   kiram  22-01-2020 23-01-2020  20
   krish  24-02-2020 05-01-2020  25
   verm   09-01-2020 25-02-2020  24
   kirn   14-12-2019 25-01-2021  56

Now I want to get the max value for date columns. Above is just an example for understanding, I can make out which date column and find the max value in it but I want to do it automatically. So, to achieve that I have used if condition but in spark date, d-type is the string itself so I have used try and except to get it but when I implement it I get nothing like below:-

for col in df:
  try:
    if dict(df.dtypes)[col]== 'string':
      min, max = df.select(min(col), max(col)).first()
      print(max)
  except:
    print(col, 'NA')

Output:-

Column<'Name'> NA
Column<'Date_1'> NA
Column<'Date_2'> NA

Expected output:-

Column<'Name'> NA
24-02-2020 
25-01-2021

Is there any better way to find the date column and find its max value?

mck · Accepted Answer

I don't understand why you used try/except. The if-statement should be enough. Also you need to use the Spark SQL min/max instead of those in Python. Avoid naming your variables as min/max, which overrides default functions.

import pyspark.sql.functions as F

for col in df.columns:
    if dict(df.dtypes)[col]== 'string':
        minval, maxval = df.select(F.min(col), F.max(col)).first()
        print(maxval)
    else:      
        print(col, 'NA')

How to get the max value of date column in pyspark

Answers (1)

Related Questions