Sudhanshu Sekhar
Sudhanshu Sekhar

Reputation: 133

Hive Query- Joining two tables on three joining conditions with OR operator

I am facing an error

"FAILED: Error in semantic analysis: Line 1:101 OR not supported in JOIN currently dob"

while running the below mentioned query..

Insert Overwrite Local Directory './Insurance_Risk/Merged_Data' Select f.name,s.age,f.gender,f.loc,f.marital_status,f.habits1,f.habits2,s.employement_status,s.occupation_class,s.occupation_subclass,s.occupation from sample_member_detail s Join fb_member_detail f 
On s.email=f.email or 
s.dob=f.dob 
or (f.name=s.name and f.loc = s.loc and f.occupation=s.occupation)
where s.email is not null and f.email is not null;

Can anyone tell me that, in hive "OR" operator can be used or not? if not, then what should be the query which will give the same result as given by the above mentioned query. I have 2 tables and I want to join the two tables on any one of the three conditions with or operator. Please help..

Upvotes: 13

Views: 36286

Answers (2)

Matt Faus
Matt Faus

Reputation: 6691

You could also use UNION to get the same results:

INSERT OVERWRITE LOCAL DIRECTORY './Insurance_Risk/Merged_Data' 
-- You can only UNION on subqueries
SELECT * FROM (
    SELECT f.name,
        s.age,
        f.gender,
        f.loc,
        f.marital_status,
        f.habits1,
        f.habits2,
        s.employement_status,
        s.occupation_class,
        s.occupation_subclass,
        s.occupation 
    FROM sample_member_detail s 
    JOIN fb_member_detail f 
    ON s.email=f.email 
    WHERE s.email IS NOT NULL AND f.email IS NOT NULL;

    UNION

    SELECT f.name,
        s.age,
        f.gender,
        f.loc,
        f.marital_status,
        f.habits1,
        f.habits2,
        s.employement_status,
        s.occupation_class,
        s.occupation_subclass,
        s.occupation 
    FROM sample_member_detail s 
    JOIN fb_member_detail f 
    ON s.dob=f.dob
    WHERE s.email IS NOT NULL AND f.email IS NOT NULL;

    UNION

    SELECT f.name,
        s.age,
        f.gender,
        f.loc,
        f.marital_status,
        f.habits1,
        f.habits2,
        s.employement_status,
        s.occupation_class,
        s.occupation_subclass,
        s.occupation 
    FROM sample_member_detail s 
    JOIN fb_member_detail f 
    ON f.name=s.name AND f.loc = s.loc AND f.occupation=s.occupation
    WHERE s.email IS NOT NULL AND f.email IS NOT NULL;

) subquery;

Upvotes: 8

www
www

Reputation: 4391

Sorry Hive supports only equi-joins. You can always try select from full Cartesian product of those tables(you have to be in non-strict mode):

Select f.name,s.age,f.gender,f.loc,f.marital_status,f.habits1,f.habits2,s.employement_status,s.occupation_class,s.occupation_subclass,s.occupation 
from sample_member_detail s join fb_member_detail f 
where (s.email=f.email 
or s.dob=f.dob 
or (f.name=s.name and f.loc = s.loc and f.occupation=s.occupation))
and s.email is not null and f.email is not null;

Upvotes: 9

Related Questions