kunal nandwana
kunal nandwana

Reputation: 39

Apache Iceberg Scheme Evolution using Spark

Currently I am using Iceberg in my project, so I am having one doubt in that.

My Current Scenario:

  1. I have loaded the data into my Iceberg table using spark data frame(this is my doing through spark job)

    df.writeTo("catalog.mydb.test2").using("iceberg").create()
    
  2. Now From source side I have added two colums and started the Job which is doing merge

    df.createOrReplaceTempView("myview")
    spark.sql("MERGE INTO catalog.mydb.test2 as t USING (SELECT * FROM myview) as s ON t.id = s.id WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT ")
    

Doing both of these step I am expecting new columns to be added into the target table but it did not work.

As I can see Iceberg Support full schema evolution.. What does it means..if it is not adding any columns dynamically to my target table.

Please help how can I achieve adding new columns into my target table dynmically.

Upvotes: 2

Views: 3203

Answers (2)

Joao Moniz
Joao Moniz

Reputation: 1

The updated Iceberg documentation is very clear about enabling the schema merge automatically if needed.

Using SQL:

ALTER TABLE prod.db.sample SET TBLPROPERTIES (
  'write.spark.accept-any-schema'='true'
)

Using Spark Dataframe API:

data.writeTo("prod.db.sample").option("mergeSchema","true").append()

Upvotes: 0

liliwei
liliwei

Reputation: 344

You can enable this with merge-schema option, but we don't recommend it because, as @shay__ points out, it can sometimes cause unmanageable catastrophes.

Upvotes: 2

Related Questions