JKC
JKC

Reputation: 2618

Hive - fetch only the latest partition of one or more hive tables

I have three partitioned tables (Yearly Partitions) in Hive and all the tables have multiple partitions. I am joining all these three tables as part of my requirement. Now I want to run this sql only for the latest partition and not for the ones created before.

I tried to use max(partition) in where clause but it seems it is not supported

I did something like below (not the exact code. Just a code concept)

select
a.*,
b.*,
c.*
from table1 a
left join table2 b on a.ID = b.ID
left join table3 c on a.ID = c.ID
where
a.year = max(a.year) and
b.year = max(b.year) and
c.year = max(c.year)

I got this error

FAILED: SemanticException [Error 10128]: Line 108:23 Not yet supported place for UDAF 'max'

I can use multiple where clauses with a subquery containing "select max(year) from table" for all the table but that's not seem to be a feasible one. Any ideas on how to achieve this ?

UPDATE I tried the where clause with conditions below but it seems only one suq-query is supported in the where clause. Not sure how to resolve this. Appreciating any inputs on this

where
a.year in (select max(year) from table1) and
b.year in (select max(year) from table2) and
c.year in (select max(year) from table3

Upvotes: 2

Views: 3130

Answers (1)

mangusta
mangusta

Reputation: 3554

Modified version:

    select
    <columns>
    from  
    (  
     select 
     <columns> 
     from 
     table1 a 
     where a.year in (select max(year) from table1) 
    ) a1
    left join 
    (
     select 
     <columns> 
     from 
     table2 b 
     where b.year in (select max(year) from table2) 
    ) b1 on a1.ID = b1.ID
    left join 
    (
     select 
     <columns> 
     from 
     table3 c 
     where c.year in (select max(year) from table3) 
    ) c1 on a1.ID = c1.ID
;

Upvotes: 2

Related Questions