alonisser
alonisser

Reputation: 12068

Delta lake in databricks - creating a table for existing storage

I currently have an append table in databricks (spark 3, databricks 7.5)

parsedDf \
        .select("somefield", "anotherField",'partition', 'offset') \
        .write \
        .format("delta") \
        .mode("append") \
        .option("mergeSchema", "true") \
        .save(f"/mnt/defaultDatalake/{append_table_name}")

It was created with a create table command before and I don't use INSERT commands to write to it (as seen above)

Now I want to be able to use SQL logic to query it without everytime going through createOrReplaceTempView every time. Is is possible to add a table to the current data without removing it? what changes do I need to support this?

UPDATE:

I've tried:

res= spark.sql(f"CREATE TABLE exploration.oplog USING DELTA LOCATION '/mnt/defaultDataLake/{append_table_name}'")

But get an AnalysisException

You are trying to create an external table exploration.dataitems_oplog from /mnt/defaultDataLake/specificpathhere using Databricks Delta, but the schema is not specified when the input path is empty.

While the path isn't empty.

Upvotes: 2

Views: 16876

Answers (1)

Alex Ott
Alex Ott

Reputation: 87069

Starting with Databricks Runtime 7.0, you can create table in Hive metastore from the existing data, automatically discovering schema, partitioning, etc. (see documentation for all details). The base syntax is following (replace values in <> with actual values):

CREATE TABLE <database>.<table>
  USING DELTA
  LOCATION '/mnt/defaultDatalake/<append_table_name>'

P.S. there is more documentation on different aspects of the managed vs unmanaged tables that could be useful to read.

P.P.S. Works just fine for me on DBR 7.5ML:

enter image description here

Upvotes: 2

Related Questions