Reputation: 387
I'm new to hive and read about it online too. But still having doubts which are not cleared.
for hive external tables, hive keep table's metadata within HDFS, but not in its warehouse which is also in HDFS
. correct ?
whether its internal or external table, in both cases data of table
will be available in HDFS only but NOWHERE
else. Mean to say, data can taken from anywhere but has to be loaded in HDFS, because HIVE uses hadoop's processing engine to process data. Correct ?
internal table, table's metadata and table's data
both will be available in HIVE's data warehouse, and this data warehouse will be at nowhere else but in HDFS only. correct ?
in external table, table's metadata and table's data
both will be NOT
available in HIVE's data warehouse but in HDFS. But hive must be keeping some info with itself that where is table's metadata located and where is its data located in HDFS, correct ?
Can anyone share feedback to above understanding ?
THanks
Upvotes: 1
Views: 256
Reputation: 407
Everything seems correct except last one. When you create external table table metadata will be stored in the Hive otherwise you can not query through hive. HDFS itself keeps control of your data when you create external table. While when you create internal table Hive will be responsible. Dropping internal table drops your data and metadata but dropping external table only drops metadata from Hive. But your data will be remain inside of your file system. Thats why we are changing table types a lot as a workaround when some of our external connection is not compatible with our hive version.
Upvotes: 1
Reputation: 38335
Hive uses relational database like MySQL
, MariaDB
, PostgreSQL
, Oracle
, DerbyDB
(for embedded deployment only) for storing metadata (databases, tables definitions, statistics, grants, etc). See deployment modes and database requirements. Does not matter Internal or external table, the metadata are stored in the relational database.
Yes, the data is stored in HDFS, but also Hive supports integration with external databases using JDBC storage handler. Such table looks like normal Hive table, but the data is stored in some database, your queries executed in the database, predicate push-down works, you can use hive native tables with storage handler tables in single query. Also HBase storage handler is available, Kafka storage handler, etc, you can write your own storage handler.
Depending on your Hive version/vendor It is possible to create many tables (both managed and external at the same time) on top of the same location in HDFS. Though Cloudera prefers to have managed tables in dedicated HDFS location for them, see https://stackoverflow.com/a/67073849/2700344 and does not allow to specify location for managed tables outside the warehouse root by default. Read abot the difference between managed and external tables here.
Upvotes: 2