Parijat Chakraborty
Parijat Chakraborty

Reputation: 37

Azure Databricks Delta Table modifies the TIMESTAMP format while writing from Spark DataFrame

I am new to Azure Databricks,I am trying to write a dataframe output to a delta table which consists TIMESTAMP column. But strangely it changes the TIMESTAMP pattern after writing to delta table. My DataFrame Output column holds the value in this format 2022-05-13 17:52:09.771, But After writing it to the Table, The column value is getting populated as

2022-05-13T17:52:09.771+0000

I am using below function to generate this Dataframe output

val pretsUTCText = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")
  val tsUTCText: String =  pretsUTCTextNew.format(ts)
  val tsUTCCol : Column = lit(tsUTCText)
  val df = df2.withColumn(to_timestamp(timestampConverter.tsUTCCol,"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))

The Dataframe output is returning 2022-05-13 17:52:09.771 as TIMESTAMP pattern. But After writing it to Delta Table I see the same value is getting populated as 2022-05-13T17:52:09.771+0000

Thanks in Advance. I could not find any solution.

Upvotes: 1

Views: 4764

Answers (1)

Phuri Chalermkiatsakul
Phuri Chalermkiatsakul

Reputation: 581

I have just found the same behaviour on Databricks as you, and it behaves differently than the Databricks document. It seems after some versions Databricks show timezone as a default so you see additional +0000. I think you can use date_format function when you populate data if you don't want it. Also, I think you don't need 'Z' in format text as it is for timezone. See the screenshot below.

enter image description here

Upvotes: 1

Related Questions