Turcia
Turcia

Reputation: 711

Hive creates empty table, even there're plenty of file

I put some files into hdfs (/path/to/directory/) which contain data like following;

63  EB44863EA74AA0C5D3ECF3D678A7DF59
62  FABBC9ED9719A5030B2F6A4591EDB180
59  6BF6D40AF15DE2D7E295EAFB9574BBF8

All of them named as _user_hive_warehouse_file_name_000XYZ_A. These files had downloaded from another hdfs.

I'm trying to create external table via Hive;

CREATE EXTERNAL TABLE users(
id int,
user string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/path/to/directory/';

It says;

OK
Time taken: 0.098 seconds

select * from users; returns empty.

select count(1) from users; returns 0.

Hive creates the table successfully, but it's always empty. If I put another file like another.txt, that contains the sample data mentioned above, select count(1) from users; returns 3.

What am I missing, why the table is empty?

Environment:

Upvotes: 1

Views: 2556

Answers (2)

ceedee
ceedee

Reputation: 196

When you run any command on Hive, it is run internally as a MapReduce Job on the HDFS path that you stored the file. The job uses the FileInputFormat to read the HDFS files which has a hiddenFileFilter which ignores any files starting with underscore ("_") and ("."). You can actually set other files to ignore by setting the FileInputFormat.SetInputPathFilter to a CustomPathFilter. Hadoop uses the files with underscores are "special" files to show job output and logs. This is probably why they are ignored.

Upvotes: 2

rchang
rchang

Reputation: 5246

I think you are encountering an issue that is peripherally discussed in HIVE-6431. In particular, this comment is the important one:

By default, FileInputFormat(which is the super class of various formats) in hadoop ignores file name starts with "_" or ".", and hard to walk around this in hive codebase.

The workaround is probably to avoid using filenames that begin with _ or .

Upvotes: 3

Related Questions