Thelight
Thelight

Reputation: 369

hive load external table from dynamic location(partitioned)

I need to create a hive table(external) to load data generated by another process. I need to be partitioned by date, But the problem is, there is a random string in the path.

Example input paths :

/user/hadoop/output/FDQM9N4RCQ3V2ZYT/20170314/
/user/hadoop/output/FDPWMUVVBO2X74CA/20170315/
/user/hadoop/output/FDPGNC0ENA6QOF9G/20170316/
.........
.........

Here 4th field in the directory is dynamic(which cannot be guessed). Each of these directories will have multiple .gz files

What location would I give while creating the table?

CREATE EXTERNAL TABLE user (
userId BIGINT,
type INT,
date String
)
PARTITIONED BY (date String)
LOCATION '/user/hadoop/output/';

is this correct? if so how do I partition it based on the date(last field in the directory)?

Upvotes: 0

Views: 576

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44921

Since you are not using the partition convention you'll have to add each partition manually.

  • The table location does not matter, but for clarity leave it as it is now.
  • I would recommend to use date type for the partition or at least the ISO format, YYYY-MM-DD.
  • I would not use date as a column name (nor int,string etc.).

PARTITIONED BY (dt date) 

alter table user add if not exists partition (dt=date '2017-03-14') location '/user/hadoop/output/FDQM9N4RCQ3V2ZYT/20170314';
alter table user add if not exists partition (dt=date '2017-03-15') location '/user/hadoop/output/FDPWMUVVBO2X74CA/20170315';
alter table user add if not exists partition (dt=date '2017-03-16') location '/user/hadoop/output/FDPGNC0ENA6QOF9G/20170316';

Upvotes: 1

Related Questions