Ben Aston
Ben Aston

Reputation: 55729

Cutting a database down to size

Say you have a database that has served a company for 10 years. It is 500GB in size, it has myriad tables, stored procedures and triggers.

Now say you wish to create a cut down version of the database to use as a test bed for use in integration testing and for individual testers and developers to spin up instances of to play around with.

In broad terms how would you set about this task?

In case it matters, the database I have in mind is SQL Server 2008.

Edit: removed "unit testing" because of course unit tests should not test db integration

Upvotes: 9

Views: 2787

Answers (6)

CarlosZ
CarlosZ

Reputation: 8669

Check http://jailer.sourceforge.net/ out. It is a tool that can extract a subset of the data from a DB while keeping it consistent referentially. I haven't used it myself, but I've been meaning to.

Upvotes: 2

John Saunders
John Saunders

Reputation: 161773

In my opinion, subsets of "real data" should not be used for unit tests. Unit tests should be independent of the initial contents of the database. They should create the data needed for the specific test, perform the test, then delete the data. Alternatively, the entire test should be within a transaction which is rolled back at the end.

If you do not do this, then your tests will fail when someone decides to delete or change the data they depend on, and you'll waste an enormous amount of time trying to find out why your tests have suddenly started to fail.

For a QA or Integration system, you should be able to create a subset of your data based on your knowledge of the relations between the tables.

Upvotes: 1

HLGEM
HLGEM

Reputation: 96552

I would under no circumstances allow developers to develop against a smaller database when it will have to run on on one this size. You will have problems that only arise when things go to prod and this is a foolish idea. The queries that work well on small datasets are not the queries that work well on large datasets. The time wasted writing queries that cannot not run on production is one reason why it is foolish to allow developers to work with a small set of data.

Upvotes: 1

Jay
Jay

Reputation: 27464

If your tables all consisted of unrelated data, you could just pick X random records from each table. I'm guessing that the problem is that the tables are NOT unrelated, so if, say, table A includes a foreign key reference to table B and you just pulled 10% of the records from table A and 10% of the records from table B, you'd have a whole bunch of invalid references from A to B.

I don't know of a general solution to this problem. It depends on the exact structure of your database. I often find that my databases consist of a small number of "central" tables that have lots of references from other tables. That is, I generally find that I have, say, an Order table, and then there's an Order Line table that points to Order, and a Customer table that Order points to, and a Delivery table that points to Order or maybe Order Line, etc, but everything seems to center around "Order". In that case, you could randomly pick some number of Order records, then find all the Customers for those Orders, all the Order Lines for those Orders, etc. I usually also have some number of "code lookup" tables, like a list of all the "order status" codes, another list of all the "customer type" codes, etc. These are usually small, so I just copy them entirely.

If your database is more ... disjointed ... than that, i.e. if it doesn't have any clear centers but is a maze of interrelationships, this could be much more complicated. I think the same principle would apply, though. Pick SOME starting point, select some records from there, then get all the records connected to those records, etc.

Upvotes: 3

JT Turner
JT Turner

Reputation: 502

I would script the database including tables, indexes, triggers, and stored procedures. Then create a new empty database with this script. Now you can add data to the database as needed for your integration tests.

You can use tools like http://code.google.com/p/ndbunit/ to load data for the tests that way the data is part of the test and will be removed once the test finishes. Also I would run the tests in SQL Express on the developers local computer this way tests don't fail if multiple developers are running them at the same time.

Upvotes: 2

JonH
JonH

Reputation: 33143

What about looking at the transaction log file? Make sure you do a backup of the original database.

USE db;
GO
-- Truncate the log by changing the database recovery model to SIMPLE.
ALTER DATABASE db
SET RECOVERY SIMPLE;
GO
-- Shrink the truncated log file to 1 MB.
DBCC SHRINKFILE (db_log, 1);
GO
-- Reset the database recovery model.
ALTER DATABASE db
SET RECOVERY FULL;
GO

I've also found great success in rebuilding indexes and defragmenting.

Tara Kizer posted this it has proven to help us out with DB performance: Thanks Tara Kizer if you read this!

-- required table
IF OBJECT_ID('DefragmentIndexes') IS NULL
    CREATE TABLE DefragmentIndexes
    (
        DatabaseName nvarchar(100) NOT NULL,
        SchemaName nvarchar(100) NOT NULL,
        TableName nvarchar(100) NOT NULL,
        IndexName nvarchar(100) NOT NULL,
        DefragmentDate datetime NOT NULL,
        PercentFragmented decimal(4, 2) NOT NULL,
        CONSTRAINT PK_DefragmentIndexes PRIMARY KEY CLUSTERED 
        (
            DatabaseName,
            SchemaName,
            TableName,
            IndexName,
            DefragmentDate
        )
    )
GO

IF OBJECT_ID(N'[dbo].[isp_ALTER_INDEX]') IS NOT NULL
    DROP PROC [dbo].[isp_ALTER_INDEX]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-------------------------------------------------------------------------------------------
-- OBJECT NAME           : isp_ALTER_INDEX
--
-- AUTHOR                : Tara Kizer
--
-- INPUTS                : @dbName         - name of the database
--                         @statsMode      - LIMITED, SAMPLED or DETAILED
--                         @defragType     - REORGANIZE (INDEXDEFRAG) or REBUILD (DBREINDEX)
--                         @minFragPercent - minimum fragmentation level
--                         @maxFragPercent - maximum fragmentation level
--                         @minRowCount    - minimum row count
--                         @logHistory     - whether or not to log what got defragmented
--                         @sortInTempdb   - whether or not to sort the index in tempdb;
--                                           recommended if your tempdb is optimized (see BOL for details)
--
-- OUTPUTS               : None
--
-- DEPENDENCIES          : DefragmentIndexes, sys.dm_db_index_physical_stats, sys.objects, sys.schemas, 
--                         sys.indexes, sys.partitions, sys.indexes, sys.index_columns, INFORMATION_SCHEMA.COLUMNS
--
-- DESCRIPTION           : Defragments indexes
/*
                           EXEC isp_ALTER_INDEX 
                             @dbName = 'QHOSClient1', 
                             @statsMode = 'SAMPLED', 
                             @defragType = 'REBUILD', 
                             @minFragPercent = 10,
                             @maxFragPercent = 100,
                             @minRowCount = 1000,
                             @logHistory = 1,
                             @sortInTempdb = 1
*/
/*
    http://weblogs.sqlteam.com/tarad/archive/2009/03/27/Defragmenting-Indexes-in-SQL-Server-2005Again.aspx
        Bug Fix - added SET QUOTED_IDENTIFIER ON to the script
        Feature - added logging feature
    http://weblogs.sqlteam.com/tarad/archive/2009/06/23/DefragmentingRebuilding-Indexes-in-SQL-Server-2005.aspx
        Bug Fix - initialized @lobData to 0 for each pass through the loop
        Bug Fix - checked for LOB data in included columns of non-clustered indexes
        Feature - added SORT_IN_TEMPB option
    http://weblogs.sqlteam.com/tarad/archive/2009/08/31/DefragmentingRebuilding-Indexes-in-SQL-server-2005-and-2008.aspx
        Bug Fix - added index_level = 0 to sys.dm_db_index_physical_stats query
    http://weblogs.sqlteam.com/tarad/archive/2009/11/03/DefragmentingRebuilding-Indexes-in-SQL-Server-2005-and-2008Again.aspx
        Bug Fix - for SQL Server 2008, @indexType could be 'XML INDEX' or 'PRIMARY XML INDEX' for XML indexes
*/ 
-------------------------------------------------------------------------------------------
CREATE PROC [dbo].[isp_ALTER_INDEX]
(
    @dbName sysname, 
    @statsMode varchar(8) = 'SAMPLED', 
    @defragType varchar(10) = 'REORGANIZE', 
    @minFragPercent int = 25, 
    @maxFragPercent int = 100, 
    @minRowCount int = 0,
    @logHistory bit = 0,
    @sortInTempdb bit = 0
)
AS

SET NOCOUNT ON

IF @statsMode NOT IN ('LIMITED', 'SAMPLED', 'DETAILED')
BEGIN
    RAISERROR('@statsMode must be LIMITED, SAMPLED or DETAILED', 16, 1)
    RETURN
END

IF @defragType NOT IN ('REORGANIZE', 'REBUILD')
BEGIN
    RAISERROR('@defragType must be REORGANIZE or REBUILD', 16, 1)
    RETURN
END

DECLARE 
    @i int, @objectId int, @objectName sysname, @indexId int, @indexName sysname, 
    @schemaName sysname, @partitionNumber int, @partitionCount int,
    @sql nvarchar(4000), @edition int, @parmDef nvarchar(500), @allocUnitType nvarchar(60),
    @indexType nvarchar(60), @online bit, @disabled bit, @dataType nvarchar(128),
    @charMaxLen int, @allowPageLocks bit, @lobData bit, @fragPercent float

SELECT @edition = CONVERT(int, SERVERPROPERTY('EngineEdition'))

SELECT 
    IDENTITY(int, 1, 1) AS FragIndexId, 
    [object_id] AS ObjectId, 
    index_id AS IndexId, 
    avg_fragmentation_in_percent AS FragPercent, 
    record_count AS RecordCount, 
    partition_number AS PartitionNumber,
    index_type_desc AS IndexType,
    alloc_unit_type_desc AS AllocUnitType
INTO #FragIndex
FROM sys.dm_db_index_physical_stats (DB_ID(@dbName), NULL, NULL, NULL, @statsMode)
WHERE 
    avg_fragmentation_in_percent > @minFragPercent AND 
    avg_fragmentation_in_percent < @maxFragPercent AND 
    index_id > 0 AND
    index_level = 0
ORDER BY ObjectId

-- LIMITED does not include data for record_count
IF @statsMode IN ('SAMPLED', 'DETAILED')
    DELETE FROM #FragIndex
    WHERE RecordCount < @minRowCount

SELECT @i = MIN(FragIndexId) 
FROM #FragIndex

SELECT 
    @objectId = ObjectId, 
    @indexId = IndexId, 
    @fragPercent = FragPercent,
    @partitionNumber = PartitionNumber,
    @indexType = IndexType,
    @allocUnitType = AllocUnitType
FROM #FragIndex
WHERE FragIndexId = @i

WHILE @@ROWCOUNT <> 0
BEGIN
    -- get the table and schema names for the index
    SET @sql = '
        SELECT @objectName = o.[name], @schemaName = s.[name]
        FROM ' + QUOTENAME(@dbName) + '.sys.objects o
        JOIN ' + QUOTENAME(@dbName) + '.sys.schemas s 
            ON s.schema_id = o.schema_id
        WHERE o.[object_id] = @objectId'

    SET @parmDef = N'@objectId int, @objectName sysname OUTPUT, @schemaName sysname OUTPUT'

    EXEC sp_executesql 
        @sql, @parmDef, @objectId = @objectId, 
        @objectName = @objectName OUTPUT, @schemaName = @schemaName OUTPUT

    -- get index information
    SET @sql = '
        SELECT @indexName = [name], @disabled = is_disabled, @allowPageLocks = allow_page_locks
        FROM ' + QUOTENAME(@dbName) + '.sys.indexes
        WHERE [object_id] = @objectId AND index_id = @indexId'

    SET @parmDef = N'
            @objectId int, @indexId int, @indexName sysname OUTPUT, 
            @disabled bit OUTPUT, @allowPageLocks bit OUTPUT'

    EXEC sp_executesql 
        @sql, @parmDef, @objectId = @objectId, @indexId = @indexId, 
        @indexName = @indexName OUTPUT, @disabled = @disabled OUTPUT, 
        @allowPageLocks = @allowPageLocks OUTPUT

    SET @lobData = 0

    -- for clustered indexes, check for columns in the table that use a LOB data type
    IF @indexType = 'CLUSTERED INDEX'
    BEGIN
        -- CHARACTER_MAXIMUM_LENGTH column will equal -1 for max size or xml
        SET @sql = '
            SELECT @lobData = 1
            FROM ' + QUOTENAME(@dbName) + '.INFORMATION_SCHEMA.COLUMNS c
            WHERE   TABLE_SCHEMA = @schemaName AND
                    TABLE_NAME = @objectName AND
                    (DATA_TYPE IN (''text'', ''ntext'', ''image'') OR 
                    CHARACTER_MAXIMUM_LENGTH = -1)'

        SET @parmDef = N'@schemaName sysname, @objectName sysname, @lobData bit OUTPUT'

        EXEC sp_executesql 
            @sql, @parmDef, @schemaName = @schemaName, @objectName = @objectName, 
            @lobData = @lobData OUTPUT
    END
    -- for non-clustered indexes, check for LOB data type in the included columns
    ELSE IF @indexType = 'NONCLUSTERED INDEX'
    BEGIN
        SET @sql = '
            SELECT @lobData = 1
            FROM ' + QUOTENAME(@dbName) + '.sys.indexes i
            JOIN ' + QUOTENAME(@dbName) + '.sys.index_columns ic
                ON i.object_id = ic.object_id
            JOIN ' + QUOTENAME(@dbName) + '.INFORMATION_SCHEMA.COLUMNS c
                ON ic.column_id = c.ORDINAL_POSITION
            WHERE   c.TABLE_SCHEMA = @schemaName AND
                    c.TABLE_NAME = @objectName AND
                    i.name = @indexName AND
                    ic.is_included_column = 1 AND
                    (c.DATA_TYPE IN (''text'', ''ntext'', ''image'') OR c.CHARACTER_MAXIMUM_LENGTH = -1)'

        SET @parmDef = N'@schemaName sysname, @objectName sysname, @indexName sysname, @lobData bit OUTPUT'

        EXEC sp_executesql 
            @sql, @parmDef, @schemaName = @schemaName, @objectName = @objectName, 
            @indexName = @indexName, @lobData = @lobData OUTPUT
    END

    -- get partition information for the index
    SET @sql = '
        SELECT @partitionCount = COUNT(*)
        FROM ' + QUOTENAME(@dbName) + '.sys.partitions
        WHERE [object_id] = @objectId AND index_id = @indexId'

    SET @parmDef = N'@objectId int, @indexId int, @partitionCount int OUTPUT'

    EXEC sp_executesql 
        @sql, @parmDef, @objectId = @objectId, @indexId = @indexId, 
        @partitionCount = @partitionCount OUTPUT

    -- Developer and Enterprise have the ONLINE = ON option for REBUILD.
    -- Indexes, including indexes on global temp tables, can be rebuilt online with the following exceptions:
    -- disabled indexes, XML indexes, indexes on local temp tables, partitioned indexes,
    -- clustered indexes if the underlying table contains LOB data types (text, ntext, image, varchar(max), 
    -- nvarchar(max), varbinary(max) or xml), and
    -- nonclustered indexes that are defined with LOB data type columns.
    -- When reoganizing and page locks is disabled for the index, we'll switch to rebuild later on, 
    -- so we need to get setup with the proper online option.
    IF @edition = 3 AND (@defragType = 'REBUILD' OR (@defragType = 'REORGANIZE' AND @allowPageLocks = 0))
    BEGIN
        SET @online = 
                CASE
                    WHEN @indexType IN ('XML INDEX', 'PRIMARY XML INDEX') THEN 0
                    WHEN @indexType = 'NONCLUSTERED INDEX' AND @allocUnitType = 'LOB_DATA' THEN 0
                    WHEN @lobData = 1 THEN 0
                    WHEN @disabled = 1 THEN 0
                    WHEN @partitionCount > 1 THEN 0
                    ELSE 1
                END
    END
    ELSE
        SET @online = 0

    -- build the ALTER INDEX statement
    SET @sql = 'ALTER INDEX ' + QUOTENAME(@indexName) + ' ON ' + QUOTENAME(@dbName) + '.' + 
        QUOTENAME(@schemaName) + '.' + QUOTENAME(@objectName) + 
        CASE
            WHEN @defragType = ' REORGANIZE' AND @allowPageLocks = 0 THEN ' REBUILD'
            ELSE ' ' + @defragType
        END

    -- WITH options
    IF @online = 1 OR @sortInTempdb = 1
    BEGIN   
        SET @sql = @sql + ' WITH (' + 
            CASE
                WHEN @online = 1 AND @sortInTempdb = 1 THEN 'ONLINE = ON, SORT_IN_TEMPDB = ON'
                WHEN @online = 1 AND @sortInTempdb = 0 THEN 'ONLINE = ON'
                WHEN @online = 0 AND @sortInTempdb = 1 THEN 'SORT_IN_TEMPDB = ON'
            END + ')'
    END

    IF @partitionCount > 1 AND @disabled = 0 AND @indexType <> 'XML INDEX'
        SET @sql = @sql + ' PARTITION = ' + CAST(@partitionNumber AS varchar(10))

    -- run the ALTER INDEX statement
    EXEC (@SQL)

    -- log some information into a history table
    IF @logHistory = 1
        INSERT INTO DefragmentIndexes (DatabaseName, SchemaName, TableName, IndexName, DefragmentDate, PercentFragmented)
        VALUES(@dbName, @schemaName, @objectName, @indexName, GETDATE(), @fragPercent)

    SELECT @i = MIN(FragIndexId) 
    FROM #FragIndex
    WHERE FragIndexId > @i

    SELECT 
        @objectId = ObjectId, 
        @indexId = IndexId, 
        @fragPercent = FragPercent,
        @partitionNumber = PartitionNumber,
        @indexType = IndexType,
        @allocUnitType = AllocUnitType
    FROM #FragIndex
    WHERE FragIndexId = @i
END

GO

The original post is here:

http://weblogs.sqlteam.com/tarad/archive/2009/08/31/DefragmentingRebuilding-Indexes-in-SQL-server-2005-and-2008.aspx

Aside from rebuilding the indexes and defragging the only other thing you can do is eliminate or get rid of data. If you've got int / bigints as PK's this will allow you to then reseed your PK using DBCC CHECKIDENT(tablename, value).

You could use ALTER INDEX ALL ON MyTable REBUILD to rebuild your indexes on your table.

Upvotes: 3

Related Questions