Jake Z
Jake Z

Reputation: 1577

Hive not fully honoring fs.default.name/fs.defaultFS value in core-site.xml

I have the NameNode service installed on a machine called hadoop.

The core-site.xml file has the fs.defaultFS (equivalent to fs.default.name) set to the following:

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://hadoop:8020</value>
</property>

I have a very simple table called test_table that currently exists in the Hive server on the HDFS. That is, it is stored under /user/hive/warehouse/test_table. It was created using a very simple command in Hive:

CREATE TABLE new_table (record_id INT);

If I attempt to load data into the table locally (that is, using LOAD DATA LOCAL), everything proceeds as expected. However, if the data is stored on the HDFS and I want to load from there, an issue occurs.

I run a very simple query to attempt this load:

hive> LOAD DATA INPATH '/user/haduser/test_table.csv' INTO TABLE test_table;

Doing so leads to the following error:

FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''/user/haduser/test_table.csv'':
Move from: hdfs://hadoop:8020/user/haduser/test_table.csv to: hdfs://localhost:8020/user/hive/warehouse/test_table is not valid.
Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict.

As the error states, it is attempting to move from hdfs://hadoop:8020/user/haduser/test_table.csv to hdfs://localhost:8020/user/hive/warehouse/test_table. The first path is correct because it references hadoop:8020; the second path is incorrect, because it references localhost:8020.

The core-site.xml file clearly states to use hdfs://hadoop:8020. The hive.metastore.warehouse value in hive-site.xml correctly points to /user/hive/warehouse. Thus, I doubt this error message has any true value.

How can I get the Hive server to use the correct NameNode address when creating tables?

Upvotes: 1

Views: 10048

Answers (1)

Jake Z
Jake Z

Reputation: 1577

I found that the Hive metastore tracks the location of each table. You can see the that location be running the following in the Hive console.

hive> DESCRIBE EXTENDED test_table;

Thus, this issue occurs if the NameNode in core-site.xml was changed while the metastore service was still running. Therefore, to resolve this issue the service should be restarted on that machine:

$ sudo service hive-metastore restart

Then, the metastore will use the new fs.defaultFS for newly created tables such.

Already Existing Tables

The location for tables that already exist can be corrected by running the following set of commands. These were obtained from Cloudera documentation to configure the Hive metastore to use High-Availability.

$ /usr/lib/hive/bin/metatool -listFSRoot
...
Listing FS Roots..
hdfs://localhost:8020/user/hive/warehouse
hdfs://localhost:8020/user/hive/warehouse/test.db

Correcting the NameNode location:

$ /usr/lib/hive/bin/metatool -updateLocation hdfs://hadoop:8020 hdfs://localhost:8020

Now the listed NameNode is correct.

$ /usr/lib/hive/bin/metatool -listFSRoot
...
Listing FS Roots..
hdfs://hadoop:8020/user/hive/warehouse
hdfs://hadoop:8020/user/hive/warehouse/test.db

Upvotes: 5

Related Questions