Reputation: 369
I need to create a hive table(external) to load data generated by another process. I need to be partitioned by date, But the problem is, there is a random string in the path.
Example input paths :
/user/hadoop/output/FDQM9N4RCQ3V2ZYT/20170314/
/user/hadoop/output/FDPWMUVVBO2X74CA/20170315/
/user/hadoop/output/FDPGNC0ENA6QOF9G/20170316/
.........
.........
Here 4th field in the directory is dynamic(which cannot be guessed). Each of these directories will have multiple .gz files
What location would I give while creating the table?
CREATE EXTERNAL TABLE user (
userId BIGINT,
type INT,
date String
)
PARTITIONED BY (date String)
LOCATION '/user/hadoop/output/';
is this correct? if so how do I partition it based on the date(last field in the directory)?
Upvotes: 0
Views: 576
Reputation: 44921
Since you are not using the partition convention you'll have to add each partition manually.
date
type for the partition or at least the ISO format, YYYY-MM-DD
.date
as a column name (nor int
,string
etc.).PARTITIONED BY (dt date)
alter table user add if not exists partition (dt=date '2017-03-14') location '/user/hadoop/output/FDQM9N4RCQ3V2ZYT/20170314';
alter table user add if not exists partition (dt=date '2017-03-15') location '/user/hadoop/output/FDPWMUVVBO2X74CA/20170315';
alter table user add if not exists partition (dt=date '2017-03-16') location '/user/hadoop/output/FDPGNC0ENA6QOF9G/20170316';
Upvotes: 1