Reputation: 87
I have a delta table of 3 columns with data. Now, I have an incoming data with 4 columns so the DF.writeStream has to update the data location atleast with 4 columns automatically, so we can recreate the table on the top of the data location. hence the old records will have nulls in the newly added columns, and the recent data will have 4 columns populated
eg:
id name addr id name addr phone
1 lok UK ----> 1 lok UK null
2 ram US +1234
but while I use the following command as per databricks wesite, it shows,
df.writeStream
.option("mergeSchema", "true")
.format("delta")
.outputMode("append")
.option("path","/data/")
.option("checkpointLocation","/checkpoint/")
.start()
.awaitTermination()
ERROR: A schema mismatch detected when writing to the Delta table
To enable schema migration, please set:
'.option("mergeSchema", "true")'.
But I am already using mergeSchema in options. Please advice.. NOTE: .saveAsTable or .table functions are also not allowed in writeStream
Upvotes: 0
Views: 6002
Reputation: 21
You probably need to change the checkpoint location.
For details, see the document here .
Upvotes: 2
Reputation: 45
You can try using foreach batch and then write the batch as delta format. This works for me.
Upvotes: 0