Michael
Michael

Reputation: 47

Need a less expensive query

I have three tables, persons, email, and personemail. Personemail basically has a foreign key to person and email so one person can be linked to multiple email addresses. Also the email table has a field named primaryemail. This field is either 1 or 0. The primary email flag is used for pulling emails into reports/invoices etc.

There was a logic flaw in the UI that allowed users to set no primary email addresses for customers. I have closed the logic flaw but I need a script to force a primary email address for any customer that doesn't have one set. It was decided to set the primary email address to the lowest value for emailid (the primary key in the email table). Below is the script that was written and it works but it is very expensive to run and may cause locks for end users while running. The software is deployed in multiple time zones so even if we run it during the lowest usage time we need it to run as fast as possible.

Here is the current script. It has temp tables and a while loop so you can see it can really be improved upon. My SQL skills need polishing so I am putting it out here for suggestions.

CREATE TABLE #TEMP(PERSONID INT, PRIMARYEMAIL INT,FLAG INT)
CREATE INDEX IDX_TEMP_PERSONID ON #TEMP(PERSONID)

CREATE TABLE #TEMP2(PERSONID INT,PRIMARYEMAIL INT)
CREATE INDEX IDX_TEMP2_PERSONID ON #TEMP2(PERSONID)

--Grab all the person id's that have at least one email addresses that is not primary in the db, also set  a flag for the while loop
INSERT INTO #TEMP
SELECT PE.PersonID, E.primaryEmail ,0 
FROM Account.tbPersonEmail PE WITH (NOLOCK)
LEFT OUTER JOIN Account.tbEmail E ON E.EmailID=PE.EmailID 
WHERE E.primaryEmail=0


--Grab all person ID's that have at least one email address that is primary.
INSERT INTO #TEMP2
SELECT PE.PersonID, E.primaryEmail 
FROM Account.tbPersonEmail PE WITH (NOLOCK)
LEFT OUTER JOIN Account.tbEmail E ON E.EmailID=PE.EmailID
WHERE E.primaryEmail=1


--SELECT * FROM #TEMP2

--Remove any customers that already have a primary email set.
DELETE FROM #TEMP WHERE PERSONID IN (SELECT DISTINCT PERSONID FROM #TEMP2)


--Debug line to see how many customers are affected.
--SELECT * FROM #TEMP


--Perfom a while loop to update the min email ID to primary.
DECLARE @INTFLAG INT
DECLARE @PERSONID INT 
SET @INTFLAG = (SELECT COUNT(*) FROM #TEMP)

--SELECT @INTFLAG

WHILE (@INTFLAG > 0)

BEGIN

SET @PERSONID =(SELECT  TOP(1) PERSONID FROM #TEMP WHERE FLAG=0)

UPDATE Account.tbEmail SET primaryEmail=1 WHERE EmailID=(SELECT MIN(EMAILID) FROM Account.tbPersonEmail where PersonID=@PERSONID)


--Update the flag on the #temp table to grab the next ID
UPDATE #TEMP SET FLAG=1 WHERE PERSONID=@PERSONID

--Reduce the intflag variable that the loop is running off of.
SET @INTFLAG=@INTFLAG-1





END

DROP TABLE #TEMP
DROP TABLE #TEMP2

Upvotes: 2

Views: 210

Answers (4)

Michael
Michael

Reputation: 47

Ended up going with this.

UPDATE Account.tbEmail set primaryEmail=1 


where EmailID in 
(SELECT P.Emailid from (
SELECT  DISTINCT P.PersonID,MIN(P.EmailID)AS EmailID
FROM
(SELECT PE.PersonID, E.primaryEmail,PE.EmailID
FROM Account.tbPersonEmail PE WITH (NOLOCK)
LEFT OUTER JOIN Account.tbEmail E ON E.EmailID=PE.EmailID 
WHERE E.primaryEmail=0 and 
PE.PersonID not in (SELECT Distinct PE2.PersonID 
FROM Account.tbPersonEmail PE2 WITH (NOLOCK)
LEFT OUTER JOIN Account.tbEmail E2 ON E2.EmailID=PE2.EmailID
WHERE E2.primaryEmail=1)
)AS P

GROUP BY P.PersonID ) as P)

Upvotes: 0

KumarHarsh
KumarHarsh

Reputation: 5094

Your logic for making primary email is not ok .Moreover putting aggregate function or rank function on varchar column is more bad. We should know other column as well.

I liked @David suggestion but not script. Try my script with proper Testing before that you should take back up also.

;With CTE as
(
        SELECT PE.PersonID, E.primaryEmail ,E.EmailID
,row_number()over(order by PE.EMAILID )rn 
        FROM Account.tbPersonEmail PE WITH (NOLOCK)
        LEFT OUTER JOIN Account.tbEmail E ON E.EmailID=PE.EmailID 
--why left join
        WHERE E.primaryEmail=0 
)
-- IN CTE you get only those which is not updated.
-- row_number()over(order by PE.EMAILID ) is equivalent to min(emailid)
UPDATE Account.tbEmail SET primaryEmail=1 
from Account.tbEmail A inner join CTE B on A.EmailID=B.EmailID
WHERE B.rn=1

Upvotes: 0

SergeyA
SergeyA

Reputation: 4487

Single query to set primaryEmail=1 for first email for each person except ones who already have primary email:

UPDATE Account.tbEmail E SET E.primaryEmail=1 
WHERE
    E.EmailID in (
        -- get min email id for each person
        SELECT min(PE.EmailID) FROM Account.tbPersonEmail PE 
        -- but exclude persons who already have primary email
        WHERE PE.PersonID NOT IN (
            SELECT PE1.PersonID
            FROM Account.tbPersonEmail PE1
            INNER JOIN Account.tbEmail E1 ON E1.EmailID=PE1.EmailID
            WHERE E1.primaryEmail=1
        )
        GROUP BY PE.PersonID
    )

Upvotes: 1

David Manheim
David Manheim

Reputation: 2626

Creating temporary tables is a very expensive way to do this, and using loops is a bad idea is SQL, as they are slow, since they can't be optimized. The typical method uses subqueries instead. To start, try doing this:

CREATE TABLE #TEMP(PERSONID INT, PRIMARYEMAIL INT,FLAG INT)
CREATE INDEX IDX_TEMP_PERSONID ON #TEMP(PERSONID)

INSERT INTO #TEMP
SELECT PE.PersonID, E.primaryEmail , 0
FROM Account.tbPersonEmail PE WITH (NOLOCK)
LEFT OUTER JOIN Account.tbEmail E ON E.EmailID=PE.EmailID 
WHERE E.primaryEmail=0 and 
PE.PersonID not in (SELECT Distinct PE2.PersonID 
FROM Account.tbPersonEmail PE2 WITH (NOLOCK)
LEFT OUTER JOIN Account.tbEmail E2 ON E.EmailID=PE2.EmailID
WHERE E2.primaryEmail=1)

And then running your while loop. That should help a bit. You can test that this is correct by seeing if #TEMP matches the previous version.

To further optimize, you probably need to rewrite the entire update process as a single query. You also may want to look at this: How can I optimize this SQL query (Using Indexes)?

Upvotes: 0

Related Questions