Nick
Nick

Reputation: 1

SQL Joining Based on Common Calculated Values

Using SQL Server 2012. Querying data from two separate tables with non-overlapping key values. I was able to calculate a common key using CASE statements that takes the following form:

Query #1:

select 
   "Region" = case 
                 when Facility = 100 then Region = 'North'
                 when Facility = 200 then Region = 'South'
                 when Facility = 300 then Region = 'Midwest'
                 else Null 
              end, 
   Other_Variable_1, Other_Variable_2, ...
from 
   Data1

Query #2:

select 
   "Region" = case 
                 when Facility = 63 then Region = 'North'
                 when Facility = 67 then Region = 'South"
                 when Facility = 89 then Region = 'Midwest'
                 else Null 
              end, 
   Other_Variable_A, Other_Variable_A, ...
from 
   Data2

I tried to link the two queries by running a JOIN statement on the two CASE statements as follows:

select 
    a.Other_Variable_1, b.Other_Variable_B
from 
    Data1 a
inner join 
    Data2 b on (case
                   when a.Facility = 100 then Region = 'North'
                   when a.Facility = 200 then Region = 'South'
                   when a.Facility = 300 then Region = 'Midwest'
                   else Null 
                end) = (case
                           when b.Facility = 63 then Region = 'North'
                           when b.Facility = 67 then Region = 'South"
                           when b.Facility = 89 then Region = 'Midwest'
                           else Null 
                         end)

The query ran for over an hour before I gave up. Is there a better way to join the two case statements without publishing two separate tables and then running a third query?

Upvotes: 0

Views: 56

Answers (2)

Ray Krungkaew
Ray Krungkaew

Reputation: 6963

I know you don't want extra tables. I just want to share my experience on the same case as yours. This method is more flexible that user can change mapping between Facility and Region. We have to join two tables (10 millions row) with computed value. We solve the performance issue by.

1.Add 1 "Region" column for each table
2.Every night there will be data processing that will update those columns base on specified value by user via MDS (I assume only user know the relation between facility and region)

create table Data1_Mapping_Facility_Region
(
    Facility int,
    Region nvarchar(60)
)
create index IX_Data1_Mapping_Facility_Region_Facility on Data1 (Facility) include (region)

create table Data2_Mapping_Facility_Region
(
    Facility int,
    Region nvarchar(60)
)
create index IX_Data2_Mapping_Facility_Region_Facility on Data2 (Facility) include (region)

insert into Data1_Mapping_Facility_Region values(100, 'North'), (200, 'South'), (300, 'Midwest')
insert into Data2_Mapping_Facility_Region values(63, 'North'), (67, 'South'), (89, 'Midwest')

Mapping_Data1_Facility_Region

100  'North'
200  'South'
300  'Midwest'

Mapping_Data2_Facility_Region

63   'North'
67   'South'
89   'Midwest'

Create index for Data1 and Data2

create index IX_Data1_Region on Data1 (Region) include (Other_Variable_1)
create index IX_Data2_Region on Data2 (Region) include (Other_Variable_B)

update Data1 and Data2

update  data
set     data.Region = map.Region
from    Data1 data
            inner join Data1_Mapping_Facility_Region map
            on data.Facility = map.Facility

update  data
set     data.Region = map.Region
from    Data2 data
            inner join Data2_Mapping_Facility_Region map
            on data.Facility = map.Facility

Change query to

select a.Other_Variable_1, b.Other_Variable_B
from   Data1 a
       inner join Data2 b on a.Region = b.Region

Compare query plan enter image description here

Upvotes: 1

davidhigh
davidhigh

Reputation: 15508

Yes, try to do it completely without the CASE statement and directly map the values onto each other:

select a.Other_Variable_1, b.Other_Variable_B
from Data1 a
inner join Data2 b
on (a.Facility = 100 and b.Facility = 63)
or (a.Facility = 200 and b.Facility = 67)
or (a.Facility = 300 and b.Facility = 89)

Next, make sure that no primary key is missing -- I have the feeling this is the case here. Otherwise you will get a cartesian product-like join of the data sets in your tables.

Upvotes: 0

Related Questions