banjara
banjara

Reputation: 3890

query hive partitioned table over date/time range

My hive table is partitioned on year, month, day, Hour

Now I want to fetch data from 2014-05-27 to 2014-06-05 How can I do that??

I know one option is create partition on epoch(or yyyy-mm-dd-hh) and in query pass epoch time. Can I do it without loosing date hierarchy??

Table Structure

CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
PARTITIONED BY (year int, month int, day int, hour int) 
STORED AS TEXTFILE;

Upvotes: 7

Views: 15249

Answers (3)

Sonal Dubey
Sonal Dubey

Reputation: 77

You can use CONCAT with LPAD.

Say you want to get all partitions between 2020-03-24, hour=00 to 2020-04-24, hour=23, then, your 'where' condition would look like:

WHERE (CONCAT(year, '-', LPAD(month,2,'0'), '-', LPAD(day,2,'0'), '_', LPAD(hour,2,'0')) > '2020-03-24_00')
AND (CONCAT(year, '-', LPAD(month,2,'0'), '-', LPAD(day,2,'0'), '_', LPAD(hour,2,'0')) < '2020-04-24_23')

Upvotes: 1

Amar
Amar

Reputation: 3845

This is a similar scenario we face everyday while querying tables in hive. We have partitioned our tables similar to the way you explained and it has helped a lot if querying. This is how we partition:

CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
PARTITIONED BY (year bigint, month bigint, day bigint, hour int) 
STORED AS TEXTFILE;

For partitions we assign values like this:

year = 2014, month = 201409, day = 20140924, hour = 01

This way the querying becomes really simple and you can directly query:

select * from table1 where day >= 20140527 and day < 20140605 

Hope this helps

Upvotes: 11

dpsdce
dpsdce

Reputation: 5450

you can query like this

  WHERE st_date > '2014-05-27-00' and end_date < '2014-06-05-24' 

should give you desired result because even if it is a sting a it will be compared lexicographically i.e '2014-04-04' will be always greater '2014-04-03'.

I ran it on my sample tables and it works perfectly fine.

Upvotes: 3

Related Questions