Franc Amour
Franc Amour

Reputation: 301

Improving performance of UPDATE query with subqueries

I have a table of ZipCodes, that contains US and Canadian postal codes and related information - including Latitude and Longitude.

I have records being inserted into a table from a separate application that I cannot modify. These records lack latitude and longitude data, so I'd like to populate the records accordingly. (Before anyone asks, doing a JOIN with the ZipCodes table at runtime didn't seem to be an option because of performance, which is why I've ended up where i am.)

I have the query shown below, which is to be set to run as a scheduled task, that should update any records that have not yet had their geo data looked up. This query (100 at a time) takes about 1m 35s to run.

I am seeking any and all options to what i'm doing here, that may improve performance.

I resorted to a scheduled task because I fear having the entire server being brought to its knees with INSTEAD OF INSERT or AFTER INSERT triggers to do the same thing... but that would have been my preference. Given the current performance issues, triggers seem out of the question.

The ZipCodes table has a non-clustered index on CityName and ProvinceAbbr, with additional included columns PostalCode, Latitude and Longitude. I have Allow Row Locks and Allow Page Locks set to false. The data in this table will change MAYBE once a quarter, so dirty reads are fine.

I can provide execution plan results if necessary... but not sure how exactly so give me a hint as to how to generate copy/paste material. =D

The ZipCodes table has 947,172 records, and the LoadsAvail table has around 38k records at any given time... records are being inserted, updated, and deleted in real time/constantly, sometimes in larger batches (I'd say maximum of 20 inserted at a time, usually more like one or two at a time).

Once most of the records have their geo data in place, there will probably be less than a 100 records being updated every minute, but I'd like to know if there's just a better way to do all this.

UPDATE TOP (100)
    LoadsAvail

SET
    coordinatesChecked = 1,
    FromLatitude =  (   SELECT TOP (1) Latitude
                                        FROM ZipCodes AS ZipCodes_1 WITH(NOLOCK)
                                        WHERE (CityName = loadsavail.FromCity) AND (ProvinceAbbr = loadsavail.FromState)
                                    ),
    FromLongitude = (   SELECT TOP (1) Longitude
                                        FROM ZipCodes AS ZipCodes_2 WITH(NOLOCK)
                                        WHERE (CityName = loadsavail.FromCity) AND (ProvinceAbbr = loadsavail.FromState)
                                    ),
    ToLatitude =    (   SELECT TOP (1) Latitude
                                    FROM ZipCodes AS ZipCodes_3 WITH(NOLOCK)
                                    WHERE        (CityName = loadsavail.ToCity) AND (ProvinceAbbr = loadsavail.toState)
                                ),
    ToLongitude =   (   SELECT TOP (1) Longitude
                                    FROM ZipCodes AS ZipCodes_4 WITH(NOLOCK)
                                    WHERE        (CityName = loadsavail.ToCity) AND (ProvinceAbbr = loadsavail.toState)
                                )
WHERE
    coordinatesChecked = 0

Upvotes: 1

Views: 530

Answers (1)

Hadi
Hadi

Reputation: 37313

You can use Joins instead of subqueries:

UPDATE T1
SET T1.coordinatesChecked = 1,
    T1.FromLatitude =  T2.Latitude,
    T1.FromLongitude = T2.Longitude
    T1.ToLatitude =    T3.Latitude,
    T1.ToLongitude =   T3.Longitude
FROM            LoadsAvail AS T1 LEFT JOIN ZipCodes AS T2 ON T1.FromCity = T2.CityName AND T1.FromState = T2.ProvinceAbbr
                                 LEFT JOIN ZipCodes AS T3 ON T1.toCity = T3.CityName AND T1.toState = T3.ProvinceAbbr
WHERE    T1.coordinatesChecked = 0

Upvotes: 3

Related Questions