JD Long
JD Long

Reputation: 60756

Make SQL Server faster at manipulating data - turn off transaction logging?

I use SQL Server 2005 as a data store for a lot of data I do analytic work on. This is not a transactional database as I am not hitting it with updates or capturing real time data. I get a few gigs of data from my clients, load them into SQL Server and do a series of manipulations. I then grab bits of this data and pull them into R where I do most of my analysis. I then push a little data into tables in SQL Server and maybe do a join or two.

I have a heck of a time with the logs in SQL Server getting big and I assume it takes some degree of overhead to create these. How can I configure SQL Server so that it runs with little or no logging? If things get corrupt I am happy to just start from the beginning. Any ideas how to make this all faster?

BTW, no need to tell me how to shrink logs, I'm already doing that. But I wish I didn't have to make the logs in the first place. I'm only using the DB to house the data because its too big to fit into memory in R.

Should I be using a more simple DB than Sql Server? Feel free to tell me I am killing an ant with a sledgehammer. But please recommend a more appropriate sized hammer. :)

Upvotes: 18

Views: 68418

Answers (6)

MarkusEgle
MarkusEgle

Reputation: 3065

Code using EntityFramework to configure your database like Richards answer describes:

using (var dbInstance = new YourEntityFrameworkDB_Context())
{
    var sqlConfigConn = dbInstance.Database.Connection as SqlConnection;
    sqlConfigConn.Open();

    using (var sqlCmd = new SqlCommand())
    {
        sqlCmd.Connection = sqlConfigConn as SqlConnection;
        sqlCmd.CommandText = String.Format("ALTER DATABASE model SET RECOVERY SIMPLE");
        var result = sqlCmd.ExecuteNonQuery();
    }
    sqlConfigConn.Close();
}

And to check if it was successful just start Management Studio and run: Screenshot Management Studio


EDIT Feb 2018:

MSDN description about the recovery model

╔══════════╦══════════════════════╦══════════════════════════════════════════╗
║ Recovery ║    Description       ║      Recover to a point in time?         ║
║  model   ║                      ║                                          ║
╠══════════╬══════════════════════╬══════════════════════════════════════════╣
║ Simple   ║ No log backups       ║ Can recover only to the end of a backup. ║
║          ║                      ║                                          ║
║ Full     ║ Requires log backups ║ Can recover to a specific point in time, ║
║          ║                      ║ assuming that your backups are complete  ║
║          ║                      ║ up to that point in time.                ║
║          ║                      ║                                          ║
║ Bulk     ║ Requires log backups ║ Can recover to the end of any backup.    ║
║ logged   ║                      ║                                          ║
╚══════════╩══════════════════════╩══════════════════════════════════════════╝

Upvotes: -1

Lee
Lee

Reputation: 101

One way to avoid logging when working with large data sets, is using SELECT/INTO. It will create a new table but none of it will be logged.

There are some things to watch for in doing this:

  • Computed columns become regular data columns
  • Indexing and identity columns would need to be established too

When done properly it can save not only space but processing time.

The alternative is something like what I am doing right now, as an example:

UPDATE [MyTable] 
SET    [Message] = REPLACE([Message], N'Content_Type', N'Content-Type')

Works fine but updates the whole table creating one huge transaction set, instead you can do:

DECLARE @IDs TABLE ([id] int)
DECLARE @Batch TABLE ([id] int)

INSERT INTO @IDs ([ID]) SELECT [ID] FROM [MyTable]

WHILE EXISTS (SELECT TOP 1 [ID] FROM @IDs)
BEGIN
  INSERT INTO @Batch ([ID]) SELECT TOP 1000 [Id] FROM @IDS

  UPDATE [MyTable] 
  SET    [Message] = REPLACE([Message], N'Content_Type', N'Content-Type') 
  WHERE  [Id] IN (SELECT [Id] FROM @Batch)

  DELETE @IDs WHERE [Id] IN (SELECT [Id] FROM @Batch)
  DELETE @Batch
END

This updates the table 1,000 rows at a time keeping your transaction size down.

Upvotes: 10

LarryB
LarryB

Reputation: 596

You won’t make your SQL Server almost much faster by turning off transaction logging but the log size can be made smaller by going to simple or bulk logged recovery mode as others already suggested.

My take on this is that you should never turn of full recovery mode except in special cases like yours when it’s definitely not needed.

Main reason for this is that transaction log in full recovery can be your only hope of recovery in case of accidently executed UPDATE, DELETE or TRUNCATE where you don’t have backups or all data is not in the backups.

There are several threads on this topic where reading transaction log was the last hope for recovery.

How can I rollback an UPDATE query in SQL server 2005?

How to undo a delete operation in SQL Server 2005?

Again, in your specific case this is probably not an issue but my guess is that it can be useful to others.

Upvotes: 3

Richard
Richard

Reputation: 108975

How can I configure SQL Server so that it runs with little or no logging? I

I don't believe you can.

However if you configure the database (each database on a server can be different) for simple backups the log file won't grow until you back it up. This is done by setting the recovery mode to "simple".

With simple backups the log is only used to hold the state of transactions until they are fully written into the main database.

Upvotes: 13

Mladen Prajdic
Mladen Prajdic

Reputation: 15677

to minimize logging use simple recovery model and do your work in batches.

Upvotes: 2

James
James

Reputation: 12796

You can minimize log consumption in SQL server by changing the database recovery model to simple see this link. Since you're not dealing with concurrency and transactions have you considered Microsoft Access?

Upvotes: 7

Related Questions