Rob
Rob

Reputation: 478

How do I convert multiple `string` columns in my dataframe to datetime columns?

I am in the process of converting multiple string columns to date time columns, but I am running into the following issues:

Example column 1:

1/11/2018 9:00:00 AM

Code:

df = df.withColumn(df.column_name, to_timestamp(df.column_name,  "MM/dd/yyyy hh:mm:ss aa"))

This works okay

Example column 2:

2019-01-10T00:00:00-05:00

Code:

df = df.withColumn(df.column_name, to_date(df.column_name,  "yyyy-MM-dd'T'HH:mm:ss'-05:00'"))

This works okay

Example column 3:

20190112

Code:

df = df.withColumn(df.column_name, to_date(df.column_name, "yyyyMMdd"))

This does not work. I get this error:

AnalysisException: "cannot resolve 'unix_timestamp(t.`date`,

'yyyyMMdd')' due to data type mismatch: argument 1 requires (string or

date or timestamp) type, however, 't.`date`' is of int type.

I feel like it should be straightforward, but I am missing something.

Upvotes: 1

Views: 1234

Answers (1)

SCouto
SCouto

Reputation: 7928

The error is pretty self explanatory, you need your column yo be a String. Are you sure your column is already a String? It seems not. You can cast it to String first with column.cast

import org.apache.spark.sql.types._
df = df.withColumn(df.column_name, to_date(df.column_name.cast(StringType), "yyyyMMdd")

Upvotes: 1

Related Questions