Ed Baker
Ed Baker

Reputation: 663

Schema mismatch on insert using Delta (spark)

I've started playing around with Delta on EMR 6.9 and I'm attempting to just perform a few basic operation for suitability.

When I use Spark Sql to create a table and then insert data I'm given an error:

An error was encountered:
A schema mismatch detected when writing to the Delta table (Table ID: a69c8802-0924-4129-9465-d0e5f2927547).
To enable schema migration using DataFrameWriter or DataStreamWriter, please set:
'.option("mergeSchema", "true")'.
For other operations, set the session configuration
spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation
specific to the operation for details.

Table schema:
root
-- id: string (nullable = true)
-- creation_date: string (nullable = true)
-- last_update_time: string (nullable = true)


Data schema:
root
-- col1: string (nullable = true)
-- col2: string (nullable = true)
-- col3: string (nullable = true)

This is the very simple code:

query = """
CREATE TABLE <<table_name>>(
    id string,
    creation_date string, 
    last_update_time string)
    USING delta 
    LOCATION "s3://my/s3/table/location/";
"""

spark.sql(query)

query = """
INSERT INTO <<table_name>> VALUES ("100", "2015-01-01", "2015-01-01T13:51:39.340396Z")
"""

spark.sql(query)

I have already tried providing the field names in the insert as that looked to be the issue, example:

INSERT INTO <<table_name>> (id,creation_date,last_update_time) VALUES ("100", "2015-01-01", "2015-01-01T13:51:39.340396Z")

But the same error is produced. Any help out there for a delta-lake noobie?

DeltaLake Version 2.1

Upvotes: 0

Views: 1077

Answers (1)

Rohit Nimmala
Rohit Nimmala

Reputation: 1549

Maybe try creating a dataframe, and try inserting it into the table using insertInto or SaveAsTable like below?

data = [('100', '2015-01-01', '2015-01-01T13:51:39.340396Z')]
columns = ['id', 'creation_date', 'last_update_time']
df = spark.createDataFrame(data, columns)

Create a new table using SaveAsTable

df.write.format('delta').saveAsTable('<<table_name>>') 

OR Insert data into existing table using insertInto:

df.write.format('delta').insertInto('<<table_name>>')

Upvotes: 0

Related Questions