Anoop Mamgain
Anoop Mamgain

Reputation: 187

Where is HIVE metadata stored by default?

I have created an external table in Hive using following:

create external table hpd_txt(
WbanNum INT,
YearMonthDay INT , 
Time INT, 
HourlyPrecip INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
stored as textfile
location 'hdfs://localhost:9000/user/hive/external';

Now this table is created in location */hive/external.

Step-1: I loaded data in this table using:

load data inpath '/input/hpd.txt' into table hpd_txt;

the data is successfully loaded in the specified path ( */external/hpd_txt)

Step-2: I delete the table from */hive/external path using following:

hadoop fs -rmr /user/hive/external/hpd_txt

Questions:

  1. why is the table deleted from original path? (*/input/hpd.txt is deleted from hdfs but table is created in */external path)
  2. After I delete the table from HDFS as in step 2, and again I use show tables; It still gives the table hpd_txt in the external path. so where is this coming from.

Thanks in advance.

Upvotes: 1

Views: 15236

Answers (4)

Anoop Mamgain
Anoop Mamgain

Reputation: 187

After going through you helping comments and other posts, I have found answer to my question. If I use LOAD INPATH command then it "moves" the source file to the location where external table is being created. Which although, wont be affected in case of dropping the table, but changing the location is not good. So use local inpath in case of loading data in Internal tables .

To load data in external tables from a file located in the HDFS, use the location in the CREATE table query which will point to the source file, for example:

create external table hpd(WbanNum string,
YearMonthDay string , 
Time string, 
hourprecip string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
stored as textfile
location 'hdfs://localhost:9000/input/hpd/';

So this sample location will point to the data already present in HDFS in this path. so no need to use LOAD INPATH command here.

Its a good practice to store a source files in their private dedicated directories. So that there is no ambiguity while external tables are created as data is in a properly managed directory system.

Thanks a lot for helping me understand this concept guys! Cheers!

Upvotes: 0

Ravindra babu
Ravindra babu

Reputation: 38950

  1. Do not use load path command. The Load operation is used to MOVE ( not COPY) the data into corresponding Hive table. Use put Or copyFromLocal to copy file from non HDFS format to HDFS format. Just provide HDFS file location in create table after execution of put command.

  2. Deleting a table does not remove HDFS file from disk. That is the advantage of external table. Hive tables just stores metadata to access data files. Hive tables store actual data of data file in HIVE tables. If you drop the table, the data file is untouched in HDFS file location. But in case of internal tables, both metadata and data will be removed if you drop table.

Upvotes: 0

Anil
Anil

Reputation: 420

The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes is handy if you already have data generated. Else, you will have data loaded (conventionally or by creating a file in the directory being pointed by the hive table) When dropping an EXTERNAL table, data in the table is NOT deleted from the file system. An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.

Source: Hive docs

So, in your step 2, removing the file /user/hive/external/hpd_txt removes the data source(data pointing to the table) but the table still exists and would continue to point to hdfs://localhost:9000/user/hive/external as it was created

@Anoop : Not sure if this answers your question. let me know if you have any questions further.

Upvotes: 0

invoketheshell
invoketheshell

Reputation: 3897

Hive doesn't know that you deleted the files. Hive still expects to find the files in the location you specified. You can do whatever you want in HDFS and this doesn't get communicated to hive. You have to tell hive if things change.

hadoop fs -rmr /user/hive/external/hpd_txt

For instance the above command doesn't delete the table it just removes the file. The table still exists in hive metastore. If you want to delete the table then use:

drop if exists tablename;

Since you created the table as an external table this will drop the table from hive. The files will remain if you haven't removed them. If you want to delete an external table and the files the table is reading from you can do one of the following:

  1. Drop the table and then remove the files
  2. Change the table to managed and drop the table

Finally the location of the metastore for hive is by default located here /usr/hive/warehouse.

Upvotes: 1

Related Questions