Reputation: 3600
I need to join geographical region table to user's table in Hive
.
geographical region can be country, state or city level.
When geographical region is county level, I need to select all the listings in that county so on. My version of hive does not allow OR in join condition.
What is the most efficient way to write this query?
For example,
Region table
region_id , city, state, country
1, Rome, NULL , IT
2, NULL, NULL, BM
3, VANCOUVER, BC, CA
User table
user_id, city , state, country
103 , VANCOUVER , BC , CA
105 , HAMILTON, NULL, BM
106 , NULL, NULL, BM
Result table
region_id, user_id, city, state, country
3, 103 , VANCOUVER , BC , CA
2, 105 , HAMILTON, NULL, BM
2, 106 , NULL, NULL, BM
Upvotes: 0
Views: 51
Reputation: 35613
Well it may not be as efficient as you would like, but this should work:
SELECT DISTINCT
coalesce(cty.region_id, sta.region_id, cou.region_id) as region_id, u.*
FROM users u
LEFT JOIN regions cty ON u.city = cty.city
LEFT JOIN regions sta ON u.state = sta.state
LEFT JOIN regions cou ON u.ccountyity = cou.county
and alternative would be:
SELECT
r.region_id
, u.*
FROM users u
INNER JOIN (
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.city = regions.city
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON usesr.state = regions.state
UNION
SELECT
regions.region_id, users.user_id
FROM users
INNER JOIN regions ON users.ccounty = regions.county
) r ON u.users_id = r.users_id
Upvotes: 1