Reputation: 1293
I have this query that's taking forever to run. The table contains about 7 million rows. Everything else I'm doing with it (it's a "temporary" permanent table) is going relatively quickly (an hour or so) while this one UPDATE belong took 7 hours! We have SQL Server 2014.
DOI
is an NVARCHAR(72)
and has a non-unique CLUSTERED
index on it. Affiliations
is a VARCHAR(8000)
. I'm not really allowed to change these data types. Affiliations
has an index on it as an include. We couldn't do a "regular" index since the field is so big.
CREATE NONCLUSTERED INDEX IX_Affiliations
ON TempSourceTable (DOI) INCLUDE (Affiliations);
What the statement below does is set a bit field to 1 if all the records for a DOI
have the same value in their Affiliations
column. This table has multiple records per DOI
value, and we want to know if the Affiliations
column is the same for all of the records with that same DOI
or not.
Is there any way I can speed this up, by writing a different query, a different index or am I going about this all wrong?
UPDATE S
SET AffiliationsSameForAllDOI = 1
FROM TempSourceTable S
WHERE NOT EXISTS (SELECT 1
FROM TempSourceTable S2
WHERE S2.DOI = S.DOI
AND S2.Affiliations <> S.Affiliations)
Upvotes: 1
Views: 2468
Reputation: 5202
I'd hope this to perform better than the other offerings, since it should execute in a single scan over the index. Also, the min/max 'trick' avoids having to collect and maintain each distinct value.
WITH X AS
(
SELECT *,
AffiliationsSameForAllDOI_New =
CASE WHEN MAX(Affiliations) OVER (PARTITION BY DOI)
= MIN(Affiliations) OVER (PARTITION BY DOI)
THEN 1
ELSE 0
END
FROM TempSourceTable
)
UPDATE X
SET AffiliationsSameForAllDOI = AffiliationsSameForAllDOI_New
WHERE AffiliationsSameForAllDOI_New = 1
Upvotes: 0
Reputation: 93724
Here is another way
SUB-QUERY
version
UPDATE TempSourceTable
SET AffiliationsSameForAllDOI = 1
WHERE doi IN (SELECT doi
FROM TempSourceTable S
GROUP BY DOI
HAVING COUNT(DISTINCT Affiliations) = 1)
EXISTS
Version
UPDATE TempSourceTable S
SET AffiliationsSameForAllDOI = 1
WHERE EXISTS (SELECT 1
FROM TempSourceTable S1
Where s1.DOI = s.DOI
HAVING COUNT(DISTINCT Affiliations) = 1)
INNER JOIN
Version
UPDATE S
SET AffiliationsSameForAllDOI = 1
FROM TempSourceTable S
INNER JOIN (SELECT doi
FROM TempSourceTable
GROUP BY DOI
HAVING COUNT(DISTINCT Affiliations) = 1) S1
ON S.DOI = S1.DOI
Upvotes: 6
Reputation: 17915
update TempSourceTable
set AffiliationsSameForAllDOI = 1
where DOI in (
select DOI
from TempSourceTable
group by DOI
having count(distinct Affiliations) = 1
)
Depending on what your data looks like maybe you'd have some luck with performance by creating a computed column that strips out say the first 16 characters from Affiliations
or just using the checksum()
and then indexing on that column instead. Perhaps it would look something like this:
update TempSourceTable
set AffiliationsSameForAllDOI = 1
where DOI in (
select DOI
from TempSourceTable
where DOI in (
select DOI
from TempSourceTable
group by DOI
having count(distinct AffiliationsChecksum) = 1
)
group by DOI
having count(distinct Affiliations) = 1
)
Upvotes: 4