Reputation: 997
I'm new on Impala, and I'm trying to understand how to delete records from a table... I've tried looking for delete commands, but didn't quite find understandable instructions...
This is my table structure:
create table Installs (BrandID INT, PublisherID INT, InstallDate STRING, HourNum INT, Country STRING, Installs INT) PARTITIONED BY (day INT, month INT, year INT) STORED AS PARQUET
Is deletion possible in Hadoop? How does the syntax works? Any help would be a great help for me... Thank you :)
Upvotes: 1
Views: 19862
Reputation: 769
Short answer: No, DELETE is not supported in Impala. The workaround is to rewrite table data with the data you want to delete not in there.
Cloudera Impala, while it supports SQL and can be used for data warehouse workloads, is not like a traditional RDBMS. Like Hive, it stores its files in HDFS (and is inter-operable with Hive in many ways), and as such, is designed to store very large files in blocks.
Thus, it, like the HDFS it depends on, is not designed to effectively delete data.
Upvotes: 1
Reputation: 5881
refer from book -learning-cloudera-impala
Impala does not support dropping or deleting a row in a table. The alternative is to either drop the table or migrate the required data to other tables and then delete the entire original table.
To simulate the effects of an UPDATE or DELETE statement in other database systems, typically you use INSERT or CREATE TABLE AS SELECT to copy data from one table to another, filtering out or changing the appropriate rows during the copy operation.
Upvotes: 2