Yogesh
Yogesh

Reputation: 99

How to ALTER ADD COLUMNS of Delta table?

I want to add some columns in a Delta table using spark sql, but it showing me error like :

ALTER ADD COLUMNS does not support datasource table with type org.apache.spark.sql.delta.sources.DeltaDataSource.
You must drop and re-create the table for adding the new columns.

Is there any way to alter my table in delta lake?

Upvotes: 4

Views: 5985

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74679

Thanks a lot for this question! I learnt quite a lot while hunting down a solution 👍


This is Apache Spark 3.2.1 and Delta Lake 1.1.0 (all open source).


The reason for the error is that Spark SQL (3.2.1) supports ALTER ADD COLUMNS statement for csv, json, parquet, orc data sources only. Otherwise, it throws the exception.

I assume you ran ALTER ADD COLUMNS using SQL (as the root cause would've been caught earlier if you'd used Scala API or PySpark).

That leads us to org.apache.spark.sql.delta.catalog.DeltaCatalog that has to be "installed" to Spark SQL for it to recognize Delta Lake as a supported datasource. This is described in the official Quickstart.

For PySpark (on command line) it'd be as follows:

./bin/pyspark \
  --packages io.delta:delta-core_2.12:1.1.0 \
  --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

In order to extend Spark SQL with Delta Lake's features (incl. ALTER ADD COLUMNS support) you have to add the following configuration properties for DeltaSparkSessionExtension and DeltaCatalog:

  1. spark.sql.extensions
  2. spark.sql.catalog.spark_catalog

They are mandatory (and optional in managed environments like Azure Databricks that were mentioned as working fine for obvious reasons).

Upvotes: 3

Abhishek Khandave
Abhishek Khandave

Reputation: 3230

ALTER ADD COLUMNS does not support datasource table with type org.apache.spark.sql.delta.sources.DeltaDataSource. You must drop and re-create the table for adding the new columns.

You are facing this issue because of different versions in configuration. Make sure you are using compatible version of spark and scala.

As mentioned in comment you can alter table in delta lake, follow link mentioned by @David דודו Markovitz

Refer this similar issue

Upvotes: 0

Related Questions