Sayandip Ghatak
Sayandip Ghatak

Reputation: 303

Join two tables based on multiple column comparison

I have a country dimension table COUNTRY_DIM like below:

ID SHORT_NAME FULL_NAME ALTERNATE_NAME
US USA United States of America United States
UK UK United Kingdom Great Britain
IN India Republic of India
ROK South Korea Republic of Korea
DE Germany Federal Republic of Germany Deutschland

I have another table COUNTRY_FACT where I have Country Name and some values like below:

COUNTRY_NAME VALUE
United States 30
India 6
United Kingdom 10
South Korea 2
Republic of Korea 1
Germany 4
USA 2

I want to join both the tables on SHORT_NAME first, if not matched then FULL_NAME if not matched then ALTERNATE_NAME and derive the Country ID.

So the output will be like below :

COUNTRY_ID VALUE
US 32
IN 6
UK 10
ROK 3
DE 4

I am using the below query:

select dim.country_id,sum(fact.value) value
from contry_fact fact
inner join country_dim dim
on
case when upper(country_name) = upper(short_name) then 1
when upper(country_name) = upper(full_name) then 1
when upper(country_name) = upper(alternate_name) then 1
else 0 end = 1
group by dim.country_id

Is this the correct method to get the desired output or we can achieve it another way ?

I am using Oracle 19c

Upvotes: 1

Views: 94

Answers (3)

ORA-01017
ORA-01017

Reputation: 1075

Yet another solution could be to use the row_number window function as follows:

select country_id, sum(value) as value from
(SELECT d.id country_id,
       f.value,
       row_number() 
        over (partition by f.rowid 
              order by case when d.short_name = f.country_name then 1 
                            when d.full_name = f.country_name then 2 
                            else 3 end) as rn
  FROM country_dim d
  join country_fact f 
    ON d.short_name = f.country_name
       OR d.full_name = f.country_name
       OR d.alternate_name = f.country_name ) t
where t.rn = 1
group by country_id;
   

Upvotes: 0

Paul W
Paul W

Reputation: 11558

Don't try to join once when you have multiple candidate join columns or any kind of conditional logic in the join clause (like that CASE statement). It creates a performance headache and limits the join methods Oracle can use.

Instead, join separately for each candidate join column, then use COALESCE in the order you want matching the priority to select the desired dimension:

SELECT COALESCE(d1.country_id,d2.country_id,d3.country_id) country_id,
       SUM(f.value) value
  FROM country_fact f
       LEFT OUTER JOIN country_dim d1 ON UPPER(f.short_name) = UPPER(d1.country_name)
       LEFT OUTER JOIN country_dim d2 ON UPPER(f.full_name) = UPPER(d2.country_name)
       LEFT OUTER JOIN country_dim d3 ON UPPER(f.alternate_name) = UPPER(d3.country_name)
 GROUP BY COALESCE(d1.country_id,d2.country_id,d3.country_id)
       

However that you're doing an aggregation suggests this is using an actual fact table rather than obtaining a dimension key to load a fact table from staging. If that's the case, you really shouldn't have text country strings in your fact table at all. It should already be replaced by the surrogate country_id at ETL time so this kind of query complexity by users isn't needed.

Upvotes: 2

Littlefoot
Littlefoot

Reputation: 143013

Not very elegant, but does the job.

SQL> select d.id, sum(f.value)
  2  from country_dim d join country_fac f on
  3    f.country_name in (d.short_name, d.full_name, d.alternate_name)
  4  group by d.id
  5  order by d.id;

ID  SUM(F.VALUE)
--- ------------
DE             4
IN             6
ROK            3
UK            10
US            32

SQL>

Alternatively (and worse, as it scans the same tables too many times),

SQL> with temp as
  2    (select d.id, f.value
  3       from country_dim d join country_fac f on f.country_name = d.short_name
  4     union all
  5     select d.id, f.value
  6       from country_dim d join country_fac f on f.country_name = d.full_name
  7     union all
  8     select d.id, f.value
  9       from country_dim d join country_fac f on f.country_name = d.alternate_name
 10  )
 11  select id, sum(value)
 12  from temp
 13  group by id
 14  order by id;

ID  SUM(VALUE)
--- ----------
DE           4
IN           6
ROK          3
UK          10
US          32

SQL>

Upvotes: 2

Related Questions