Reputation: 42
I have a table having year, month and day as partition. I am trying to find an optimized way to read the data for last n days using parameters. The only way I can do this at the moment is by specifying each of the combination of year, month, and day individually which is very problematic if we have to read a lot of data, say for 1 month. Below is a sample example.
select count(*) from table
where (year = 2021 and month = 7 and day = 5)
or (year = 2021 and month = 7 and day = 4)
or (year = 2021 and month 7 and day 3)
I am interested in knowing the following.
select count(*) from table
where year = 2021 and month = 7 and (case when day between 4 and 7 then 1 else 0 end) = 1
How does partition work behind the scenes? I believe that the query gets converted into a map reduce job before execution. Will the both codes mentioned above will be converted to same map reduce job?
Can I use functions like case when freely with partitioned columns in where clause and will the hive query engine be able to interpret the function and scan the appropriate partitions?
Is there any built in function in hive to know which partitions are getting hit by the query? If not, is there any workaround? Is there any way to know the same in presto?
Upvotes: 0
Views: 440
Reputation: 38335
Partition pruning works fine with queries like this, the logic is like in your CASE expression:
where concat(year, '-', month, '-', day) >= '2021-07-04'
and
concat(year, '-', month, '-', day) <= '2021-07-07'
See this answer.
How to check how partition pruning works: Use EXPLAIN DEPENDENCY
or EXPLAIN EXTENDED
See this answer.
Upvotes: 1