Reputation: 303
I have a country dimension table COUNTRY_DIM like below:
ID | SHORT_NAME | FULL_NAME | ALTERNATE_NAME |
---|---|---|---|
US | USA | United States of America | United States |
UK | UK | United Kingdom | Great Britain |
IN | India | Republic of India | |
ROK | South Korea | Republic of Korea | |
DE | Germany | Federal Republic of Germany | Deutschland |
I have another table COUNTRY_FACT where I have Country Name and some values like below:
COUNTRY_NAME | VALUE |
---|---|
United States | 30 |
India | 6 |
United Kingdom | 10 |
South Korea | 2 |
Republic of Korea | 1 |
Germany | 4 |
USA | 2 |
I want to join both the tables on SHORT_NAME first, if not matched then FULL_NAME if not matched then ALTERNATE_NAME and derive the Country ID.
So the output will be like below :
COUNTRY_ID | VALUE |
---|---|
US | 32 |
IN | 6 |
UK | 10 |
ROK | 3 |
DE | 4 |
I am using the below query:
select dim.country_id,sum(fact.value) value
from contry_fact fact
inner join country_dim dim
on
case when upper(country_name) = upper(short_name) then 1
when upper(country_name) = upper(full_name) then 1
when upper(country_name) = upper(alternate_name) then 1
else 0 end = 1
group by dim.country_id
Is this the correct method to get the desired output or we can achieve it another way ?
I am using Oracle 19c
Upvotes: 1
Views: 94
Reputation: 1075
Yet another solution could be to use the row_number
window function as follows:
select country_id, sum(value) as value from
(SELECT d.id country_id,
f.value,
row_number()
over (partition by f.rowid
order by case when d.short_name = f.country_name then 1
when d.full_name = f.country_name then 2
else 3 end) as rn
FROM country_dim d
join country_fact f
ON d.short_name = f.country_name
OR d.full_name = f.country_name
OR d.alternate_name = f.country_name ) t
where t.rn = 1
group by country_id;
Upvotes: 0
Reputation: 11558
Don't try to join once when you have multiple candidate join columns or any kind of conditional logic in the join clause (like that CASE
statement). It creates a performance headache and limits the join methods Oracle can use.
Instead, join separately for each candidate join column, then use COALESCE
in the order you want matching the priority to select the desired dimension:
SELECT COALESCE(d1.country_id,d2.country_id,d3.country_id) country_id,
SUM(f.value) value
FROM country_fact f
LEFT OUTER JOIN country_dim d1 ON UPPER(f.short_name) = UPPER(d1.country_name)
LEFT OUTER JOIN country_dim d2 ON UPPER(f.full_name) = UPPER(d2.country_name)
LEFT OUTER JOIN country_dim d3 ON UPPER(f.alternate_name) = UPPER(d3.country_name)
GROUP BY COALESCE(d1.country_id,d2.country_id,d3.country_id)
However that you're doing an aggregation suggests this is using an actual fact table rather than obtaining a dimension key to load a fact table from staging. If that's the case, you really shouldn't have text country strings in your fact table at all. It should already be replaced by the surrogate country_id
at ETL time so this kind of query complexity by users isn't needed.
Upvotes: 2
Reputation: 143013
Not very elegant, but does the job.
SQL> select d.id, sum(f.value)
2 from country_dim d join country_fac f on
3 f.country_name in (d.short_name, d.full_name, d.alternate_name)
4 group by d.id
5 order by d.id;
ID SUM(F.VALUE)
--- ------------
DE 4
IN 6
ROK 3
UK 10
US 32
SQL>
Alternatively (and worse, as it scans the same tables too many times),
SQL> with temp as
2 (select d.id, f.value
3 from country_dim d join country_fac f on f.country_name = d.short_name
4 union all
5 select d.id, f.value
6 from country_dim d join country_fac f on f.country_name = d.full_name
7 union all
8 select d.id, f.value
9 from country_dim d join country_fac f on f.country_name = d.alternate_name
10 )
11 select id, sum(value)
12 from temp
13 group by id
14 order by id;
ID SUM(VALUE)
--- ----------
DE 4
IN 6
ROK 3
UK 10
US 32
SQL>
Upvotes: 2