Reputation: 211
I have this query:
SELECT
RTPropertyUniqueIdentifier
,[FA Unique Listing Identifier - Ref ID]
,MLS.[Listing Tracking ID]
, MLS.[Assessor's Parcel Identification Number] as MLS_PARCELID
, PP.ParcelID
, GEOID
, CASE
WHEN PP.PP_ParcelID = MLS.MLS_PARCELID then 'PARCEL MATCH'
Else 'ADDRESS MATCH' END as MATCHTYPE
,[Update Timestamp]
INTO PROPERTY.DBO.PP_MLS_Bridge
FROM STAGE.DBO.STAGE_MLS_BRG MLS
join STAGE.DBO.STAGE_PP_BRG PP on
(
PP.PP_ParcelID = MLS.MLS_PARCELID
) OR (
lower(MLS.StreetNum) = lower(PP.AddNum)
and (lower(PP.AddStreet) like '%'+lower(MLS.Street_Name)+'%' or lower(MLS.Street_Name) like CONCAT('%', lower(PP.AddStreet), '%'))
and (lower(PP.AddUnitNum) like '%' + lower(MLS.Unit) + '%' or lower(MLS.Unit) like CONCAT('%', lower(PP.AddUnitNum), '%'))
and lower(PP.AddCity) = lower(MLS.[Property City])
and lower(PP.AddState) = lower(MLS.[Property State])
and lower(PP.AddZip) = lower(MLS.[Property Zip])
);
The MLS table is 50 Million Records strong. The PP table is 185 Million Records strong. This query will run for over 8 days straight, which means I'm not sure I have it optimized as well. I'm looking for a way to speed it up.
Thanks
UPDATE 2: Updated SQL Statement
INSERT PROPERTY.DBO.PP_MLS_Bridge
SELECT
RTPropertyUniqueIdentifier
,[FA Unique Listing Identifier - Ref ID]
,MLS.[Listing Tracking ID]
, MLS.[Assessor's Parcel Identification Number] as MLS_PARCELID
, PP.ParcelID
, GEOID
, CASE
WHEN PP.PP_ParcelID = MLS.MLS_PARCELID then 'PARCEL MATCH'
Else 'ADDRESS MATCH' END as MATCHTYPE
,[Update Timestamp]
FROM STAGE.DBO.STAGE_MLS_BRG MLS
join STAGE.DBO.STAGE_PP_BRG PP on PP.PP_ParcelID = MLS.MLS_PARCELID
UNION
SELECT
RTPropertyUniqueIdentifier
,[FA Unique Listing Identifier - Ref ID]
,MLS.[Listing Tracking ID]
, MLS.[Assessor's Parcel Identification Number] as MLS_PARCELID
, PP.ParcelID
, GEOID
, CASE
WHEN PP.PP_ParcelID = MLS.MLS_PARCELID then 'PARCEL MATCH'
Else 'ADDRESS MATCH' END as MATCHTYPE
,[Update Timestamp]
FROM STAGE.DBO.STAGE_MLS_BRG MLS
join STAGE.DBO.STAGE_PP_BRG PP on
MLS.StreetNum = PP.AddNum
and (PP.AddStreet like '%'+MLS.Street_Name+'%' or MLS.Street_Name like CONCAT('%', PP.AddStreet, '%'))
and (PP.AddUnitNum like '%' + MLS.Unit + '%' or MLS.Unit like CONCAT('%', PP.AddUnitNum, '%'))
and PP.AddCity = MLS.[Property City]
and PP.AddState = MLS.[Property State]
and PP.AddZip = MLS.[Property Zip];
Upvotes: 0
Views: 50
Reputation: 24903
Ok, you have 2 terrible table scans. So, I can give you common advices:
LOWER
. SQL Server compares strings ignoring casesPP.PP_ParcelID
and MLS.MLS_PARCELID
OR
in your join with 2 selects with UNION
- query optimizer will select better planlower(PP.AddZip) = lower(MLS.[Property Zip])
with some join on simple integer keys (like PP.AddZipId = MLS.PropertyZipId
). Comparing ints is much simplierUpvotes: 1