Reputation: 5822
I executed an update statement along the following lines yesterday:
UPDATE MainTable
Set SomeField = SubsetTable.SomeField
where MainTable.MainTableKey = SubsetTable.MainTableKey
where SubsetTable
is a subset of MainTable
and has the same Primary Key field. MainTable
has roughly 200m records, SubsetTable
has 5m records. MainTableKey
is a GUID.
Both of these table have a clustered index on MainTableKey
.
When I executed this query the first time it took a whopping 14 hours.
Then I added a non-clustered index to MainTableKey
on both tables. Now it takes 30 minutes.
Does anyone have any ideas on why the performance gain would be so dramatic?
Upvotes: 0
Views: 243
Reputation: 110151
I bet if you look at the execution plans:
The first query is a merge join which involved reading both tables completely.
200M rows + 5M rows = 205M rows.
205M rows / 14 hours = 4067 rows per second.
The second query is a nested loop join which reads the whole small table and seeks into the large table for each small table row.
5M + 5M rows = 10M rows.
10M rows / 40 minutes = 4166rows per second.
That these rates are approximately equal, supports my theory about which rows are read.
You don't have to guess: Run the queries with SET STATISTICS IO ON
, and/or look at the execution plans.
Upvotes: 1