Aamir
Aamir

Reputation: 2424

Convert data field inside dataframe from any format to a fixed format in Spark Scala

I have a date column in my Spark DataDrame that contains multiple string date formats(it can be MM-dd-yyyy, dd-MM-yyyy, MM.dd.yyyy). I would like to cast all these to MM/dd/yyyy.I tried using regex to differentiate between formats and using udfs, I couldn't find them much fault-tolerant.I believe there are SQL functions which we can use directly without expensive and inefficient reformatting but I am not aware of them, I tried them, but couldn't find any solution.

Is there a better way to do this?

Upvotes: 0

Views: 72

Answers (1)

Subhasish Guha
Subhasish Guha

Reputation: 232

UDF does not work well if its an iterator. Identifying MM-dd-yyyy or dd-MM-yyyy from the data is not possible if month and day are less than 12 and you can not really do anything. The best way to achieve this to pass date format from the source. Any source system will have a synchronous date format. If you can get the format for each date in a separate column, then this problem gets resolved. If that is not possible, use a row Iterator and convert this particular column into static yyyy-MM-dd implicit date format for Spark. then implement your structure

Upvotes: 1

Related Questions