Reputation: 333
I have a Spark dataframe which is actually a huge parquet file read from the container instance in Azure. And I want to make delta lake format out of it. By every time I try to do that it throws an error without any message attached.
I want to save it to the Databricks itself or to the container instance (if possible).
I tried already df.write.format("delta").save("f"abfss://{container}@{storage_account_name}.dfs.core.windows.net/my_data)
and
df.write.format("delta").saveAsTable("f"abfss://{container}@{storage_account_name}.dfs.core.windows.net/my_data)
and
CREATE DELTA TABLE lifetime_delta
USING parquet
OPTIONS (f"abfss://{container}@{storage_account_name}.dfs.core.windows.net/")
I do think I need to create a table somehow and I heard that since parquet is native for Delta Lake it's already existing in Delta Lake context but for some reason it's not quite true.
None of it worked for me. Thank you in advance.
Upvotes: 1
Views: 2660
Reputation: 19328
There are two main ways to convert Parquet files to a Delta Lake:
df.write.format("delta").save("f"abfss://{container}@{storage_account_name}.dfs.core.windows.net/my_data)
You may need to change it as follows in accordance with the Python f-string syntax:
df.write.format("delta").save(f"abfss://{container}@{storage_account_name}.dfs.core.windows.net/my_data")
from delta import *
deltaTable = DeltaTable.convertToDelta(spark, "parquet.`tmp/lake2`")
Here's an example notebook with code snippets to perform this operation that you may find useful.
Upvotes: 3