Reputation: 291
Data Table
Name Company Continent country state district
Tom HP Asia India Assam Kdk
George SAP Africa Sudan Chak ksk
Bill EBAY Europe Denmark Lekh Sip
Charles WM Asia India Haryana Jhat
Chip WM Asia India Punjab Chista
Chia WM Asia India Punjab Mast
Rule Table
Continent country state district Pass
Asia India ALL ALL Yes
Asia India Punjab ALL NO
Asia India Punjab Mast Yes
I have two tables in Hive. Depending on the rule I have to filter out the data in the data table.
In the rule table there is a column called pass which determines whether a record in data table needs to be filtered or not.
In this example there are different kinds of rules. They are the ones at broader level and at narrow level. The rules at narrow level should not affect the rules at broader level. This means the rules at narrow level is an exception to rules at broader level. For ex: in the rules table, there are 3 records. The first record is the rule at broader level. The other ones are at narrow level.
The first rules says to pass all the records that have country as india,state as any/all and district as any/all. The second rule says to not pass all the records that have country as India, state as punjab and district as any/all. The third rule says to pass all records that have country as India,state as punjab and district as Mast.
The second rule is an exception to first rule. The third rule is an exception to second rule.
Considering the data in the data table and rules in the rules table, the pass columns will be as follows for the Indian(country) records.
Name Company Continent country state district Pass
Tom HP Asia India Assam Kdk Yes
Charles WM Asia India Haryana Jhat Yes
Chia WM Asia India Punjab Mast Yes
Chip WM Asia India Punjab Chista No
This is just an example. In production the data will be different.
How do I implement this using SQL/Sql script?
Help is much appreciated.
Upvotes: 0
Views: 235
Reputation: 1269563
You want the most specific rule. In Hive, you can use multiple left join
s:
select d.*, coalesce(r1.pass, r2.pass, r3.pass)
from data d left join
rules r1
on r1.Continent = d.Continent and
r1.country = d.country and
r1.state = d.state and
r1.district = d.district left join
rules r2
on r2.Continent = d.Continent and
r2.country = d.country and
r2.state = d.state and
r2.district = 'ALL' left join
rules r3
on r3.Continent = d.Continent and
r3.country = d.country and
r3.state = 'ALL' and
r3.district = 'ALL' ;
You might want to continue with the LEFT JOIN
s if 'ALL'
is allowed for continent
and country
.
Upvotes: 1
Reputation: 184
@TomG : Please see the below code if that helps
select * from TEMP_TESTING where country ='India' and district<>'Chista'
union
(select * from TEMP_TESTING where country ='India' except
select * from TEMP_TESTING where country ='India' and state='Punjab')
union
select * from TEMP_TESTING where country ='India'and state='Punjab' and district='Mast'
Upvotes: 0