user1411335
user1411335

Reputation: 3600

Efficient Join in hive without OR condition

I need to join geographical region table to user's table in Hive. geographical region can be country, state or city level. When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.

What is the most efficient way to write this query?

For example,

Region table

region_id , city, state, country
1,  Rome, NULL , IT
2, NULL, NULL, BM
3, VANCOUVER, BC, CA

User table

user_id, city , state, country
103 , VANCOUVER , BC , CA
105 , HAMILTON, NULL, BM
106 , NULL, NULL, BM

Result table

region_id, user_id, city, state, country
3, 103 , VANCOUVER , BC , CA
2, 105 , HAMILTON, NULL, BM
2, 106 , NULL, NULL, BM

Upvotes: 0

Views: 51

Answers (1)

Paul Maxwell
Paul Maxwell

Reputation: 35613

Well it may not be as efficient as you would like, but this should work:

SELECT DISTINCT
    coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
FROM users u
LEFT JOIN regions cty ON u.city = cty.city
LEFT JOIN regions sta ON u.state = sta.state
LEFT JOIN regions cou ON u.ccountyity = cou.county

and alternative would be:

SELECT
    r.region_id
  , u.*
FROM users u
INNER JOIN (
        SELECT
            regions.region_id, users.user_id
        FROM users
        INNER JOIN regions ON users.city = regions.city
        UNION
        SELECT
            regions.region_id, users.user_id
        FROM users
        INNER JOIN regions ON usesr.state = regions.state
        UNION
        SELECT
            regions.region_id, users.user_id
        FROM users
        INNER JOIN regions ON users.ccounty = regions.county
    ) r ON u.users_id = r.users_id

Upvotes: 1

Related Questions