Reputation: 39
I have a spark data frame like below:-
df =
Name Date_1 Date_2 Roll.no
kiram 22-01-2020 23-01-2020 20
krish 24-02-2020 05-01-2020 25
verm 09-01-2020 25-02-2020 24
kirn 14-12-2019 25-01-2021 56
Now I want to get the max
value for date columns. Above is just an example for understanding, I can make out which date column and find the max
value in it but I want to do it automatically.
So, to achieve that I have used if condition
but in spark date, d-type
is the string itself so I have used try and except to get it but when I implement it I get nothing like below:-
for col in df:
try:
if dict(df.dtypes)[col]== 'string':
min, max = df.select(min(col), max(col)).first()
print(max)
except:
print(col, 'NA')
Output:-
Column<'Name'> NA
Column<'Date_1'> NA
Column<'Date_2'> NA
Expected output:-
Column<'Name'> NA
24-02-2020
25-01-2021
Is there any better way to find the date column and find its max value?
Upvotes: 0
Views: 7207
Reputation: 42342
I don't understand why you used try/except
. The if-statement should be enough. Also you need to use the Spark SQL min/max instead of those in Python. Avoid naming your variables as min/max, which overrides default functions.
import pyspark.sql.functions as F
for col in df.columns:
if dict(df.dtypes)[col]== 'string':
minval, maxval = df.select(F.min(col), F.max(col)).first()
print(maxval)
else:
print(col, 'NA')
Upvotes: 1