Reputation: 181
I have some records of people in HDFS. I use external table in Hive to view, to do my analytics on that particular data and also I can use it externally in other programs.
Recently I got an use case where I have to update the data in HDFS. As per documentation I got to know that we cant update or delete data using external table.
Another problem is the data is not ORC format. It is actually in TEXTFILE format. So I am unable to do update or delete data in internal table too. As it is in production I cant copy it to anywhere to convert it to ORC Format. Please suggest me how to Edit the data in HDFS.
Upvotes: 2
Views: 240
Reputation: 38325
You can Update or Delete using INSERT OVERWRITE
+ select from itself using filters and additional transformatins:
insert overwrite table mytable
select col1, --apply transformations here
col2, --for example: case when col2=something then something_else else col2 end as col2
...
colN
from mytable
where ... filter out records you want to delete
This approach will work for both External and Managed and for all storage formats. Just write select which returns required dataset and add INSERT OVERWRITE.
Upvotes: 3