Reputation: 599
I have a hive external table with partition by year, month day and hour.
PARTITIONED BY (
`year` int,
`month` int,
`day` int,
`hour` int)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
'hdfs://path/to/data'
The data exists in directories such as
2014/05/10/07/00
2014/05/10/07/01
...
2014/05/10/07/22
2014/05/10/07/23
I get results When I select data using the following:
Select * from my_table where year=2014 and month="05" and day="07" and hour="03"
but I want to be able to query with out the quotes for values starting with a zero. Currently the following two examples don't work:
Select * from my_table where year=2014 and month=05 and day=07 and hour=03
Select * from my_table where year=2014 and month=5 and day=7 and hour=3
How can I support this? (instead of changing the directories not to have zero prefix on single digit values).
Thanks,
Guy
Upvotes: 0
Views: 515
Reputation: 3845
Before I go into the answer, this does involve changing the directory names but it will really make querying simple for you.
We have a similar kind of structure for our partitions but instead of using the names is this format 2014/05/10/07/22, we use it like 2014/201405/20140510/07/20140510.22. Basically the partitions are:
PARTITIONED BY
(
years bigint,
months bigint,
days bigint,
hours float
)
Now coming to the advantages of using this:
Query mentioned in the question:
Select * from my_table where year=2014 and month=05 and day=07 and hour=03
After new partitions
Select * from my_table where hour = 20140507.03
Also other queries on days and months can be run directly without explicitly specifying months and years.
Upvotes: 0