Reputation: 2682
I have a table in SQL Server 2008 (SP2) containing 30 million rows, table size 150GB, there are a couple of int columns and two nvarchar(max) columns: one containing text (from 1-30000 characters) and one containing xml (up to 100000 characters).
Table doesn't have any primary keys or indexes (its is a staging table). So I am running a query:
UPDATE [dbo].[stage_table]
SET [column2] = SUBSTRING([column1], 1, CHARINDEX('.', [column1])-1);
the query is running for 3 hours (and it is still not completed), which I think is too long. Is It? I can see that there is constant read rate of 5MB/s and write rate of 10MB/s to .mdf file.
How can I find out why the query is running so long? The "server" is i7, 24GB of ram, SATA disks on RAID 10.
Updated:
table contains one int column, two nvarchar(20) columns and two nvarchar(max) columns. Column1 and Columns2 in the update clause above are nvarchar(20) columns. The "big" columns are not updated.
Many thanks!
Upvotes: 5
Views: 6750
Reputation: 291
There are a few options here. But without more information regarding what you intend to do with the data after this update is performed, Larry Lustig's answer sounds like the most appropriate. But other options follow:
Upvotes: 1
Reputation: 12833
I haven't done this kind of processing in SQL Server, so I'm not sure if the advice fully apply. But I'm confident enough to suggest you try it though.
What I usually do in Oracle is to avoid updates entirely when processing ALL rows in a situation like the one you describe (single user, batch event).
Either I migrate the logic from the update statement back to the statement that inserted the rows. Or if this is not possible, I create a new table and put the update logic in the select list. For example, instead of doing
UPDATE [dbo].[stage_table]
SET [column2] = SUBSTRING([column1], 1, CHARINDEX('.', [column1])-1);
I would do:
create table stage_table2 as
select column1
,substring(column1, 1, charindex('.', column1)-1) as column2
,column3
,column4
from stage_table;
drop table stage_table;
alter table stage_table2 rename to stage_table;
-- re-create indexes and constraints, optionally gather statistics
I could also do this with parallel query and nologging option to generate very little redo and no undo at all, which would outperform an update statement with such a large marginal it's not even funny :) Of course this is because of Oracle internals, but I think it would be possible to replicate it with SQL Server as well. There is something in your description that may make this a less efficient approach though. You had some really large text columns that you would have to "drag along" in the CTAS statement.
Also, you need to investigate your hardware setup because it is not fit to work with the amount of data you have thrown at it. Either there is something wrong with the configuration, or you have a lot of other activity going on:
I can see that there is constant read rate of 5MB/s and write rate of 10MB/s to .mdf file.
I can beat that on my girlfriends 2 year old laptop. Given a read speed of 5 mb/s and a table of 150 GB, it would take 8,5 hours to scan through the table just once. This is assuming that the database adds 0% overhead, which is not the case.
Upvotes: 1
Reputation: 18239
A practical way to test if something is out of the ordinary is to only update some of the data. Write a view that selects say the top 10,000 rows, and run the update against the view.
If 10,000 updates in what you would expect to be "normal" for your server, then it would follow that it is just "a lot of data to update".
If this small updates seems unduely long, then investigate more.
At least this gives you a decent testing ground.
Upvotes: 1
Reputation: 50970
Honestly, that's a huge amount of work that you're doing (text searching and replacing on 150 gigabytes). If the staged data originated outside the database you might consider doing the text operations there, without any of the database overhead.
Upvotes: 3