Reputation: 179
I am new to Hadoop and I just started working on Hive, I my understanding it provides a query language to process data in HDFS. With HiveQl
we can create tables and load data into it from HDFS.
So my question is: where are those tables stored? Specifically if we have 100 GB file in our HDFS and we want to make a hive table out of that data what will be the size of that table and where is it stored?
If my understanding about this concept is wrong please correct me ..
Upvotes: 9
Views: 7636
Reputation: 1082
Hive will create a directory on HDFS. If you didn't specify any location it will create a directory at /user/hive/warehouse
on HDFS. After load command the files are moved to the /warehouse/tablename
. You can also point to the HDFS directory if it contains partitions (if the files are partitioned), or use external table concept.
Upvotes: 1
Reputation: 6913
If the table is 100GB you should consider an Hive External Table (as opposed to a "managed table", for the difference, see this).
With an external table the data itself will be still stored on the HDFS in the file path that you specify (note that you may specify a directory of files as long as they all have the same structure), but Hive will create a map of it in the meta-store whereas the managed table will store the data "in Hive".
When you drop a managed table, it drops the underlying data as opposed to dropping a hive external table which only drops the meta-data from the meta-store referencing that data.
Either way you are using only 100GB as viewed by the user and are taking advantage of the HDFS' robustness though duplication of the data.
Upvotes: 5