rrejc
rrejc

Reputation: 2682

Updating table takes very long time

I have a table in SQL Server 2008 (SP2) containing 30 million rows, table size 150GB, there are a couple of int columns and two nvarchar(max) columns: one containing text (from 1-30000 characters) and one containing xml (up to 100000 characters).

Table doesn't have any primary keys or indexes (its is a staging table). So I am running a query:

UPDATE [dbo].[stage_table] 
SET [column2] = SUBSTRING([column1], 1, CHARINDEX('.', [column1])-1);

the query is running for 3 hours (and it is still not completed), which I think is too long. Is It? I can see that there is constant read rate of 5MB/s and write rate of 10MB/s to .mdf file.

How can I find out why the query is running so long? The "server" is i7, 24GB of ram, SATA disks on RAID 10.

Updated:

table contains one int column, two nvarchar(20) columns and two nvarchar(max) columns. Column1 and Columns2 in the update clause above are nvarchar(20) columns. The "big" columns are not updated.

Many thanks!

Upvotes: 5

Views: 6750

Answers (5)

Bradford Hoagland
Bradford Hoagland

Reputation: 291

There are a few options here. But without more information regarding what you intend to do with the data after this update is performed, Larry Lustig's answer sounds like the most appropriate. But other options follow:

  • Create column2 as a calculated column instead of a physical column.
  • Perform the calculation as you pull the data from the staging table (which is also what would take place if you go with the previous bullet.
  • Index column2 and then perform the updates in chunks of 10,000 records or so where column2 is null. This will keep the implicit transaction size down, which is probably what is currently killing your performance.

Upvotes: 1

Ronnis
Ronnis

Reputation: 12833

I haven't done this kind of processing in SQL Server, so I'm not sure if the advice fully apply. But I'm confident enough to suggest you try it though.

What I usually do in Oracle is to avoid updates entirely when processing ALL rows in a situation like the one you describe (single user, batch event).

Either I migrate the logic from the update statement back to the statement that inserted the rows. Or if this is not possible, I create a new table and put the update logic in the select list. For example, instead of doing

UPDATE [dbo].[stage_table] 
SET [column2] = SUBSTRING([column1], 1, CHARINDEX('.', [column1])-1);

I would do:

create table stage_table2 as
   select column1
         ,substring(column1, 1, charindex('.', column1)-1) as column2
         ,column3
         ,column4
     from stage_table;

drop table stage_table;

alter table stage_table2 rename to stage_table;
-- re-create indexes and constraints, optionally gather statistics

I could also do this with parallel query and nologging option to generate very little redo and no undo at all, which would outperform an update statement with such a large marginal it's not even funny :) Of course this is because of Oracle internals, but I think it would be possible to replicate it with SQL Server as well. There is something in your description that may make this a less efficient approach though. You had some really large text columns that you would have to "drag along" in the CTAS statement.

Also, you need to investigate your hardware setup because it is not fit to work with the amount of data you have thrown at it. Either there is something wrong with the configuration, or you have a lot of other activity going on:

I can see that there is constant read rate of 5MB/s and write rate of 10MB/s to .mdf file.

I can beat that on my girlfriends 2 year old laptop. Given a read speed of 5 mb/s and a table of 150 GB, it would take 8,5 hours to scan through the table just once. This is assuming that the database adds 0% overhead, which is not the case.

Upvotes: 1

IamIC
IamIC

Reputation: 18239

A practical way to test if something is out of the ordinary is to only update some of the data. Write a view that selects say the top 10,000 rows, and run the update against the view.

If 10,000 updates in what you would expect to be "normal" for your server, then it would follow that it is just "a lot of data to update".

If this small updates seems unduely long, then investigate more.

At least this gives you a decent testing ground.

Upvotes: 1

Larry Lustig
Larry Lustig

Reputation: 50970

Honestly, that's a huge amount of work that you're doing (text searching and replacing on 150 gigabytes). If the staged data originated outside the database you might consider doing the text operations there, without any of the database overhead.

Upvotes: 3

Oded
Oded

Reputation: 498942

You are doing some string manipulation on a field - something that SQL is notoriously bad at. Consider writing a SQL CLR function that does what you need and use that instead of SUBSTRING([column1], 1, CHARINDEX('.', [column1])-1).

Upvotes: 1

Related Questions