Rao
Rao

Reputation: 153

sub queries with view in hive 1.2.1

How can we achieve sub-queries in hive view. As I came to know that hive query won't allow sub-queries, we can achieve only by joins or unions. But I have a different scenario, where I can't apply any of them. All my tables were partitioned by updatedate and type columns. type will get as input parameter, And I have to get max updatedate at query.

Below is query

select ..........
From table1 t1
JOIN table2 t2 ON t1.t1id = t2.t2id
JOIN table3 t3 ON t1.t1id = t3.t3id
JOIN table4 t4 ON t1.t1id = t4.t4id
JOIN table5 t5 ON t1.t1id = t5.t5id
where 
AND t1.updatedate IN (select max(updatedate) updatedate from t1 where type = '${hiveconf:inputtype}' ) 
AND t2.updatedate IN (select max(updatedate) updatedate from t1 where type = '${hiveconf:inputtype}' ) 
AND t3.updatedate IN (select max(updatedate) updatedate from t3 where type = '${hiveconf:inputtype}' )
AND t4.updatedate IN (select max(updatedate) updatedate from t4 where type = '${hiveconf:inputtype}' )
AND t5.updatedate IN (select max(updatedate) updatedate from t5 where type = '${hiveconf:inputtype}' )  
-- ## Query is not working, it throws exception

I have tried as below, its working but here I applied group by on id.

select ..........
From table1 t1
JOIN (select max(updatedate) updatedate, t2id from t2 where type = '${hiveconf:inputtype}' group by t2id) t2 ON t1.t1id = t2.t2id
JOIN (select max(updatedate) updatedate, t3id from t3 where type = '${hiveconf:inputtype}' group by t3id) t3 ON t1.t1id = t3.t3id
JOIN (select max(updatedate) updatedate, t4id from t4 where type = '${hiveconf:inputtype}' group by t4id) t4 ON t1.t1id = t4.t4id
JOIN (select max(updatedate) updatedate, t5id from t5 where type = '${hiveconf:inputtype}' group by t5id) t5 ON t1.t1id = t5.t5id
where 
t1.updatedate IN (select max(updatedate) updatedate from t1 where type = '${hiveconf:inputtype}' ) 

Any suggestions to achieve this in better-way ?

Upvotes: 0

Views: 54

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269853

Absolutely. Use window functions:

SELECT ..........
FROM table1 t1 JOIN
     (SELECT t2.*,
             ROW_NUMBER() OVER (PARTITION BY t2.t2id ORDER BY t2.updateddate DESC) as seqnum
      FROM table2 t2
     ) t2
     ON t1.t1id = t2.t2id AND t2.seqnum = 1 JOIN
     . . . 

Continue for the rest of the tables.

Upvotes: 1

Related Questions