harryboot
harryboot

Reputation: 23

SparkSQL DELETE command doesn't delete one single row in Apache Iceberg, does it?

I use Spark SQL 3.0 with scala_2.12. I insert data into the iceberg table and read data from the tabel successfully.when i tried to delete one wrong record from the tabel by spark SQL , the log shows exception . The issue 1444 of apache iceberg in github shows iceberg support row level delete in last version. why did i delete unsuccessfully? The main iceberg version i used is 0.10.0 . The package org.apache.iceberg.iceberg-hive version is 0.9.1 . Please help! My Spark SQL code segment is :

public static void deleteSingleDataWithoutCatalog3(){
    // SparkSQL Configure
    SparkConf sparkSQLConf = new SparkConf();
    // 'hadoop_prod' is name of the catalog,which is used in accessing table
    sparkSQLConf.set("spark.sql.catalog.hadoop_prod", "org.apache.iceberg.spark.SparkCatalog");
    sparkSQLConf.set("spark.sql.catalog.hadoop_prod.type", "hadoop");
    sparkSQLConf.set("spark.sql.catalog.hadoop_prod.warehouse", "hdfs://hadoop01:9000/warehouse_path/");
    sparkSQLConf.set("spark.sql.sources.partitionOverwriteMode", "dynamic");
    
    SparkSession spark = SparkSession.builder().config(sparkSQLConf).master("local[2]").getOrCreate();
    // String selectDataSQLALL = "select * from  hadoop_prod.xgfying.booksSpark3 ";
    String deleteSingleDataSQL = "DELETE FROM  hadoop_prod.xgfying.booksSpark3 where price=33 ";
    // spark.sql(deleteSingleDataSQL);
    spark.table("hadoop_prod.xgfying.booksSpark3").show();
    spark.sql(deleteSingleDataSQL);
    spark.table("hadoop_prod.xgfying.booksSpark3").show();
}

when the code runs, the exception message is:

......
Exception in thread "main" java.lang.IllegalArgumentException: Failed to cleanly delete data files matching: ref(name="price") == 33
        at org.apache.iceberg.spark.source.SparkTable.deleteWhere(SparkTable.java:168)
......
Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot delete file where some, but not all, rows match filter ref(name="price") == 33: hdfs://hadoop01:9000/warehouse_path/xgfying/booksSpark3/data/title=Gone/00000-1-9070110f-35f8-4ee5-8047-cca2a1caba1f-00001.parquet
......

Upvotes: 0

Views: 2141

Answers (1)

Temitope Fatayo
Temitope Fatayo

Reputation: 419

I know this is rather an old question, I ran to a similar issue recently, I was able to fix it by adding spark.sql.extension to spark config

--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions 

Upvotes: 1

Related Questions