PPPP
PPPP

Reputation: 601

selecting incremental data from multiple tables in Hive

I have five tables(A,B,C,D,E) in Hive database and I have to union the data from these tables based on logic over column "id".

The condition is :

Select * from A
UNION 
select * from B (except  ids not in A)
UNION 
select * from C (except ids not in A and B)
UNION 
select * from D(except ids not in A,B and C)
UNION 
select * from E(except ids not in A,B,C and D)

Have to insert this data into final table.

One way is to create a the target table (target)and append it with data for each UNION stage and then using this table for joining with the other UNION stage.

This would be the part of my .hql file :

insert into target 
(select * from A
UNION 
select B.* from 
A 
RIGHT OUTER JOIN B
on A.id=B.id
where ISNULL(A.id));

INSERT INTO target
select C.* from 
target 
RIGHT outer JOIN C
ON target.id=C.id
where ISNULL(target.id);

INSERT INTO target
select D.* from 
target 
RIGHT OUTER JOIN D
ON target.id=D.id
where ISNULL(target.id);

INSERT INTO target
select E.* from 
target 
RIGHT OUTER JOIN E
ON target.id=E.id
where ISNULL(target.id);

Is there a better to make this happen ? I assume we anyway have to do the multiple joins/lookups .I am looking forward for best approach to achieve this in

1) Hive with Tez

2) Spark-sql

Many Thanks in advance

Upvotes: 2

Views: 690

Answers (2)

David דודו Markovitz
David דודו Markovitz

Reputation: 44951

If id is unique within each table, then row_number can be used instead of rank.

select      *

from       (select      *
                       ,rank () over
                        (
                            partition by    id
                            order by        src
                        )                           as rnk

            from        (           
                                    select 1 as src,* from a
                        union all   select 2 as src,* from b
                        union all   select 3 as src,* from c
                        union all   select 4 as src,* from d
                        union all   select 5 as src,* from e
                        ) t
            ) t

where       rnk = 1
;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269953

I think I would try to do this as:

with ids as (
      select id, min(which) as which
      from (select id, 1 as which from a union all
            select id, 2 as which from b union all
            select id, 3 as which from c union all
            select id, 4 as which from d union all
            select id, 5 as which from e
           ) x
     )
select a.*
from a join ids on a.id = ids.id and ids.which = 1
union all
select b.*
from b join ids on b.id = ids.id and ids.which = 2
union all
select c.*
from c join ids on c.id = ids.id and ids.which = 3
union all
select d.*
from d join ids on d.id = ids.id and ids.which = 4
union all
select e.*
from e join ids on e.id = ids.id and ids.which = 5;

Upvotes: 0

Related Questions