arcee123
arcee123

Reputation: 211

Trying to query optimize a join based on four fields with IN joins

I have this query:

SELECT 
      RTPropertyUniqueIdentifier
      ,[FA Unique Listing Identifier - Ref ID]
      ,MLS.[Listing Tracking ID]
      , MLS.[Assessor's Parcel Identification Number] as MLS_PARCELID
      , PP.ParcelID
      , GEOID
      , CASE 
        WHEN PP.PP_ParcelID = MLS.MLS_PARCELID then 'PARCEL MATCH'
        Else 'ADDRESS MATCH' END as MATCHTYPE
      ,[Update Timestamp]
          INTO PROPERTY.DBO.PP_MLS_Bridge
  FROM STAGE.DBO.STAGE_MLS_BRG MLS
  join STAGE.DBO.STAGE_PP_BRG PP on 
    (
        PP.PP_ParcelID = MLS.MLS_PARCELID
    ) OR (
        lower(MLS.StreetNum) = lower(PP.AddNum)
        and (lower(PP.AddStreet) like '%'+lower(MLS.Street_Name)+'%' or lower(MLS.Street_Name) like CONCAT('%', lower(PP.AddStreet), '%'))
        and (lower(PP.AddUnitNum) like '%' + lower(MLS.Unit) + '%' or lower(MLS.Unit) like CONCAT('%', lower(PP.AddUnitNum), '%'))
        and lower(PP.AddCity) = lower(MLS.[Property City])
        and lower(PP.AddState) = lower(MLS.[Property State])
        and lower(PP.AddZip) = lower(MLS.[Property Zip])
);

The MLS table is 50 Million Records strong. The PP table is 185 Million Records strong. This query will run for over 8 days straight, which means I'm not sure I have it optimized as well. I'm looking for a way to speed it up.

Thanks

Update 1: Execution Plan: enter image description here

UPDATE 2: Updated SQL Statement

INSERT PROPERTY.DBO.PP_MLS_Bridge
SELECT 
      RTPropertyUniqueIdentifier
      ,[FA Unique Listing Identifier - Ref ID]
      ,MLS.[Listing Tracking ID]
      , MLS.[Assessor's Parcel Identification Number] as MLS_PARCELID
      , PP.ParcelID
      , GEOID
      , CASE 
        WHEN PP.PP_ParcelID = MLS.MLS_PARCELID then 'PARCEL MATCH'
        Else 'ADDRESS MATCH' END as MATCHTYPE
      ,[Update Timestamp]
  FROM STAGE.DBO.STAGE_MLS_BRG MLS
  join STAGE.DBO.STAGE_PP_BRG PP on PP.PP_ParcelID = MLS.MLS_PARCELID
UNION
SELECT 
      RTPropertyUniqueIdentifier
      ,[FA Unique Listing Identifier - Ref ID]
      ,MLS.[Listing Tracking ID]
      , MLS.[Assessor's Parcel Identification Number] as MLS_PARCELID
      , PP.ParcelID
      , GEOID
      , CASE 
        WHEN PP.PP_ParcelID = MLS.MLS_PARCELID then 'PARCEL MATCH'
        Else 'ADDRESS MATCH' END as MATCHTYPE
      ,[Update Timestamp]
  FROM STAGE.DBO.STAGE_MLS_BRG MLS
  join STAGE.DBO.STAGE_PP_BRG PP on
        MLS.StreetNum = PP.AddNum
        and (PP.AddStreet like '%'+MLS.Street_Name+'%' or MLS.Street_Name like CONCAT('%', PP.AddStreet, '%'))
        and (PP.AddUnitNum like '%' + MLS.Unit + '%' or MLS.Unit like CONCAT('%', PP.AddUnitNum, '%'))
        and PP.AddCity = MLS.[Property City]
        and PP.AddState = MLS.[Property State]
        and PP.AddZip = MLS.[Property Zip];

Execution Plan: Part 1 enter image description here

Part 2 enter image description here

Upvotes: 0

Views: 50

Answers (1)

Backs
Backs

Reputation: 24903

Ok, you have 2 terrible table scans. So, I can give you common advices:

  1. Remove LOWER. SQL Server compares strings ignoring cases
  2. Add indexes on PP.PP_ParcelID and MLS.MLS_PARCELID
  3. Replace OR in your join with 2 selects with UNION - query optimizer will select better plan
  4. Try to replace all joins on string like lower(PP.AddZip) = lower(MLS.[Property Zip]) with some join on simple integer keys (like PP.AddZipId = MLS.PropertyZipId). Comparing ints is much simplier

Upvotes: 1

Related Questions