Guy Wald
Guy Wald

Reputation: 599

Map a hive partition to a location

I have a hive external table with partition by year, month day and hour.

PARTITIONED BY ( 
  `year` int, 
  `month` int, 
  `day` int, 
  `hour` int)
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.SequenceFileInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
    'hdfs://path/to/data'

The data exists in directories such as

2014/05/10/07/00

2014/05/10/07/01

...

2014/05/10/07/22

2014/05/10/07/23

I get results When I select data using the following:

Select * from my_table where year=2014 and month="05" and day="07" and hour="03"

but I want to be able to query with out the quotes for values starting with a zero. Currently the following two examples don't work:

Select * from my_table where year=2014 and month=05 and day=07 and hour=03
Select * from my_table where year=2014 and month=5 and day=7 and hour=3

How can I support this? (instead of changing the directories not to have zero prefix on single digit values).

Thanks,

Guy

Upvotes: 0

Views: 515

Answers (1)

Amar
Amar

Reputation: 3845

Before I go into the answer, this does involve changing the directory names but it will really make querying simple for you.

We have a similar kind of structure for our partitions but instead of using the names is this format 2014/05/10/07/22, we use it like 2014/201405/20140510/07/20140510.22. Basically the partitions are:

 PARTITIONED BY 
  (
  years bigint,
  months bigint,
  days bigint,
  hours float
  )  

Now coming to the advantages of using this:

Query mentioned in the question:

Select * from my_table where year=2014 and month=05 and day=07 and hour=03

After new partitions

Select * from my_table where hour = 20140507.03

Also other queries on days and months can be run directly without explicitly specifying months and years.

Upvotes: 0

Related Questions