Guissous Allaeddine
Guissous Allaeddine

Reputation: 445

How to lookup a surrogate key using two columns as business key in SSIS

I am working on a Data warehouse project; I have a design to follow a part of it is shown in the screenshot below:

enter image description here

The source table is as follows:

enter image description here

As the source table shows, the entity column is not unique in its own but unique if combined with the committee title.

Problem: I need to have a business key to lookup the surrogate key "COMMITTEE_SRGT", from the DIM_COMMITTE to the FACT_COMMITTEE using lookup component "if it's the best practice",

My solution: I concatenate the two columns in the staging ETL as shown in the structure script of both the tables in the staging:

FACT table in the staging ETL:

CREATE TABLE [dbo].[STG_STR_COMMITTEES_MEETINGS](
[COMMITTE_BKEY] [nvarchar](520) NULL,
[ENTITY] [nvarchar](255) NULL,
[COMMITTEE_TITLE] [nvarchar](255) NULL,
[NUMBER_MEETING_CONDUCTED] [int] NULL,
[NUMBRER_MEETING_ATTENDED_BY_MEMBER] [int] NULL
)

DIM table in the staging ETL:

CREATE TABLE [dbo].[STG_STR_COMMITTEES](
[COMMITTE_BKEY] [nvarchar](520) NULL,
[ENTITY] [nvarchar](255) NULL,
[COMMITTEE_TITLE] [nvarchar](255) NULL,
[MEMBERSHIP_STATUS] [nvarchar](20) NULL,
[START_DATE] [date] NULL,
[END_DATE] [date] NULL,
[MEMBERS_NAMES] [nvarchar](255) NULL
)

With this solution the lookup was easy, but it was not accepted as it changed the DIM table design.

If there is any better solution "methodology", where I can combine two columns to generate a composite key to lookup the surrogate key from the DIM table to the FACT without changing the datawarehouse design.

EDIT:

Query:

SELECT ISNULL(COMMITTEE_SRGT, 0) AS COMMITTEE_SRGT,
cm.ENTITY, NUMBER_MEETING_CONDUCTED, 
NUMBRER_MEETING_ATTENDED_BY_SAAC_MEMBER
FROM [dbo].[STG_STR_COMMITTEES_MEETINGS] cm
 LEFT JOIN [dbo].[STG_STR_COMMITTEES] c 
      ON cm.ENTITY = c.ENTITY
      AND cm.COMMITTEE_TITLE = c.COMMITTEE_TITLE

Upvotes: 0

Views: 285

Answers (1)

Mark Wojciechowicz
Mark Wojciechowicz

Reputation: 4477

Using the lookup component is not a best practice. I'll outline why that is below. However, you are able to use multiple columns for a look up as shown here:

enter image description here

Why the Lookup Component is problematic

  • It's case sensitive, so you have to be careful when joining on string columns
  • The component will cache the whole table in memory by default, which could be undesirable on large dimensions. It is best to use a query as the source instead of the table, so you only select the columns you need. On large tables, its better to use cache files, which is a lot of extra work for something that's easy to solve in SQL
  • Duplicate values in the lookup will throw a warning, but the component will just select the first value it encounters, kind of randomly

A way better solution

Just do the join in staging. Use a left join to refer to an unknown run if the dimension row does not exist or insert inferred members in advance of the join:

SELECT ISNULL(Committee_SRGT, 0) AS Committee_SRGT,
 ...
FROM [dbo].[STG_STR_COMMITTEES_MEETINGS] cm
     LEFT JOIN [dbo].[STG_STR_COMMITTEES] c 
          ON cm.Entity = c.Entity 
          AND cm.committee_title = c.committee_title

Upvotes: 1

Related Questions