Sunny
Sunny

Reputation: 369

How to delete rows in hive hadoop database

I'm a newbie with hadoop & hive. I want to delete certain rows in my database - which is on hive-hadoop. I know its not supported out of the box, and that hadoop is a read only file system. I'm curious about what are the best approaches for accomplishing this. If anyone has done this before, can they share their learnings/procedures?

Thanks!

Upvotes: 4

Views: 39764

Answers (4)

user6216847
user6216847

Reputation: 41

INSERT OVERWRITE TABLE mytable  SELECT * FROM mytable WHERE 1=0;

The above query first deletes the data files, as OVERWRITE option is specified in the query. Next it will execute SELECT to read the records from the table, as condition fails, SELECT query will get zero records and insert zero records into the table.

Upvotes: 4

Henaras
Henaras

Reputation: 21

Or you could insert the rest of of rows (i.e., those you want to keep) into another table and then drop the old table.

Upvotes: 1

Jerome Banks
Jerome Banks

Reputation: 1630

In Big Data there really aren't deletes. That said, you can overwrite your table or partition if it isn't too big, or isolate your deletes to a particular partition like JamCon suggests.

For datasets which are not too huge, you can do something like

INSERT OVERWRITE TABLE mytable
SELECT * FROM mytable
WHERE ID NOT IN ( 'delete1', 'delete2', 'delete3');

Upvotes: 13

JamCon
JamCon

Reputation: 2333

The best approach is to partition your data such that the rows you want to drop are in a partition unto themselves. You can then drop the partition without impacting the rest of your table. This is a fairly sustainable model, even if your dataset grows quite large.

Upvotes: 3

Related Questions