arkiboys
arkiboys

Reputation: 39

delete from delta parquet files in storage gen2

Using the below code, I can read from delta but not sure how to delete from it tried running delete but I get error.

This is what I have and do you see how to run delete please?

Thanks

df = spark.read.parquet(
   f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net{delta_folder_path}")

df_today = df.filter("year=" + str(yearNo) + " and month=" + 
  str(monthNo) + " and day=" + str(dayNo))

display(df_today) --> displays correctly

df_today.createOrReplaceTempView("vw_presentation")

Then in another notebook cell I am using

%sql
--select * from vw_presentation --this select works fine

delete from vw_presentation where name = 'xyz'

error --> Error in SQL statement: AssertionError: assertion failed: No plan for DeleteFromTable

I even tried it this way but still gives error:

%sql

delete from delta.'/presentation/delivery/year=2022/month=05/day=26' 
   where name = 'xyz'

delta parquet structure

Upvotes: 0

Views: 3500

Answers (1)

Alex Ott
Alex Ott

Reputation: 87069

The main problem is that your source table is in the Parquet format, not in Delta. And Parquet doesn't support delete and update operations. If you want to perform such operations, you have two choices:

  • Convert Parquet files to Delta using the CONVERT TO DELTA SQL command
  • Use Spark code to perform what you need:
    • Read full dataset
    • Filter out data that you want to leave: df.filter("name != 'xyz'")
    • Write data back using .mode("overwrite")

Upvotes: 1

Related Questions