I want to know which partitions are getting hit when I run a hive query?

Question

I have a table having year, month and day as partition. I am trying to find an optimized way to read the data for last n days using parameters. The only way I can do this at the moment is by specifying each of the combination of year, month, and day individually which is very problematic if we have to read a lot of data, say for 1 month. Below is a sample example.

select count(*) from table 
where (year = 2021 and month = 7 and day = 5)
or (year = 2021 and month = 7 and day = 4)
or (year = 2021 and month 7 and day 3)

I am interested in knowing the following.

Can I use case when in where clause without impacting the performance? For example, will the below query read the same amount of data as the above query?

select count(*) from table 
where year = 2021 and month = 7 and (case when day between 4 and 7 then 1 else 0 end) = 1

How does partition work behind the scenes? I believe that the query gets converted into a map reduce job before execution. Will the both codes mentioned above will be converted to same map reduce job?
Can I use functions like case when freely with partitioned columns in where clause and will the hive query engine be able to interpret the function and scan the appropriate partitions?
Is there any built in function in hive to know which partitions are getting hit by the query? If not, is there any workaround? Is there any way to know the same in presto?

I want to know which partitions are getting hit when I run a hive query?

Answers (1)

Related Questions