TomG
TomG

Reputation: 291

Filtering out records in a SQL table using rules in another SQL table

Data Table

Name    Company     Continent   country     state   district    
Tom     HP          Asia        India       Assam   Kdk    
George  SAP         Africa      Sudan       Chak    ksk     
Bill    EBAY        Europe      Denmark     Lekh    Sip     
Charles WM          Asia        India       Haryana Jhat    
Chip    WM          Asia        India       Punjab  Chista    
Chia    WM          Asia        India       Punjab  Mast

Rule Table

Continent   country     state   district    Pass    
Asia        India       ALL     ALL         Yes    
Asia        India       Punjab  ALL         NO
Asia        India       Punjab  Mast        Yes

I have two tables in Hive. Depending on the rule I have to filter out the data in the data table.

In the rule table there is a column called pass which determines whether a record in data table needs to be filtered or not.

In this example there are different kinds of rules. They are the ones at broader level and at narrow level. The rules at narrow level should not affect the rules at broader level. This means the rules at narrow level is an exception to rules at broader level. For ex: in the rules table, there are 3 records. The first record is the rule at broader level. The other ones are at narrow level.

The first rules says to pass all the records that have country as india,state as any/all and district as any/all. The second rule says to not pass all the records that have country as India, state as punjab and district as any/all. The third rule says to pass all records that have country as India,state as punjab and district as Mast.

The second rule is an exception to first rule. The third rule is an exception to second rule.

Considering the data in the data table and rules in the rules table, the pass columns will be as follows for the Indian(country) records.

Name    Company     Continent   country     state   district    Pass    
Tom     HP          Asia        India       Assam   Kdk         Yes    
Charles WM          Asia        India       Haryana Jhat        Yes    
Chia    WM          Asia        India       Punjab  Mast        Yes
Chip    WM          Asia        India       Punjab  Chista      No

This is just an example. In production the data will be different.

How do I implement this using SQL/Sql script?

Help is much appreciated.

Upvotes: 0

Views: 235

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269563

You want the most specific rule. In Hive, you can use multiple left joins:

select d.*, coalesce(r1.pass, r2.pass, r3.pass)
from data d left join
     rules r1
     on r1.Continent = d.Continent and
        r1.country = d.country and
        r1.state = d.state and
        r1.district = d.district left join
     rules r2
     on r2.Continent = d.Continent and
        r2.country = d.country and
        r2.state = d.state  and
        r2.district = 'ALL'  left join
     rules r3
     on r3.Continent = d.Continent and
        r3.country = d.country and
        r3.state = 'ALL' and
        r3.district = 'ALL' ;

You might want to continue with the LEFT JOINs if 'ALL' is allowed for continent and country.

Upvotes: 1

karan arora
karan arora

Reputation: 184

@TomG : Please see the below code if that helps

select * from TEMP_TESTING  where country ='India'  and district<>'Chista'
union 
(select * from TEMP_TESTING where country ='India'   except
select * from TEMP_TESTING where  country ='India' and state='Punjab')
union 
select * from TEMP_TESTING where country ='India'and state='Punjab' and district='Mast'

Upvotes: 0

Related Questions