Reputation: 1945
lets say, I created Hive external table "myTable" from file myFile.csv ( located in HDFS ).
myFile.csv is changed every day, then I'm interested to update "myTable" once a day too.
Is there any HiveQL query that tells to update the table every day?
Thank you.
P.S.
I would like to know if it works the same way with directories: lets say, I create Hive partition from HDFS directory "myDir", when "myDir" contains 10 files. next day "myDIr" contains 20 files (10 files were added). Should I update Hive partition?
Upvotes: 6
Views: 17266
Reputation: 8530
There are two types of tables in Hive basically.
One is Managed table managed by hive warehouse whenever you create a table data will be copied to internal warehouse.
You can not have latest data in the query output
.
Other is external table in which hive will not copy its data to internal warehouse
.
So whenever you fire query on table then it retrieves data from the file.
SO you can even have the latest data in the query output.
That is one of the goals of external table.
You can even drop the table and the data is not lost.
Upvotes: 8
Reputation: 10931
If you add a LOCATION '/path/to/myFile.csv'
clause to your table create statement, you shouldn't have to update anything in Hive. It will always use the latest version of the file in queries.
Upvotes: 4