Reputation: 55729
Say you have a database that has served a company for 10 years. It is 500GB in size, it has myriad tables, stored procedures and triggers.
Now say you wish to create a cut down version of the database to use as a test bed for use in integration testing and for individual testers and developers to spin up instances of to play around with.
In broad terms how would you set about this task?
In case it matters, the database I have in mind is SQL Server 2008.
Edit: removed "unit testing" because of course unit tests should not test db integration
Upvotes: 9
Views: 2787
Reputation: 8669
Check http://jailer.sourceforge.net/ out. It is a tool that can extract a subset of the data from a DB while keeping it consistent referentially. I haven't used it myself, but I've been meaning to.
Upvotes: 2
Reputation: 161773
In my opinion, subsets of "real data" should not be used for unit tests. Unit tests should be independent of the initial contents of the database. They should create the data needed for the specific test, perform the test, then delete the data. Alternatively, the entire test should be within a transaction which is rolled back at the end.
If you do not do this, then your tests will fail when someone decides to delete or change the data they depend on, and you'll waste an enormous amount of time trying to find out why your tests have suddenly started to fail.
For a QA or Integration system, you should be able to create a subset of your data based on your knowledge of the relations between the tables.
Upvotes: 1
Reputation: 96552
I would under no circumstances allow developers to develop against a smaller database when it will have to run on on one this size. You will have problems that only arise when things go to prod and this is a foolish idea. The queries that work well on small datasets are not the queries that work well on large datasets. The time wasted writing queries that cannot not run on production is one reason why it is foolish to allow developers to work with a small set of data.
Upvotes: 1
Reputation: 27464
If your tables all consisted of unrelated data, you could just pick X random records from each table. I'm guessing that the problem is that the tables are NOT unrelated, so if, say, table A includes a foreign key reference to table B and you just pulled 10% of the records from table A and 10% of the records from table B, you'd have a whole bunch of invalid references from A to B.
I don't know of a general solution to this problem. It depends on the exact structure of your database. I often find that my databases consist of a small number of "central" tables that have lots of references from other tables. That is, I generally find that I have, say, an Order table, and then there's an Order Line table that points to Order, and a Customer table that Order points to, and a Delivery table that points to Order or maybe Order Line, etc, but everything seems to center around "Order". In that case, you could randomly pick some number of Order records, then find all the Customers for those Orders, all the Order Lines for those Orders, etc. I usually also have some number of "code lookup" tables, like a list of all the "order status" codes, another list of all the "customer type" codes, etc. These are usually small, so I just copy them entirely.
If your database is more ... disjointed ... than that, i.e. if it doesn't have any clear centers but is a maze of interrelationships, this could be much more complicated. I think the same principle would apply, though. Pick SOME starting point, select some records from there, then get all the records connected to those records, etc.
Upvotes: 3
Reputation: 502
I would script the database including tables, indexes, triggers, and stored procedures. Then create a new empty database with this script. Now you can add data to the database as needed for your integration tests.
You can use tools like http://code.google.com/p/ndbunit/ to load data for the tests that way the data is part of the test and will be removed once the test finishes. Also I would run the tests in SQL Express on the developers local computer this way tests don't fail if multiple developers are running them at the same time.
Upvotes: 2
Reputation: 33143
What about looking at the transaction log file? Make sure you do a backup of the original database.
USE db;
GO
-- Truncate the log by changing the database recovery model to SIMPLE.
ALTER DATABASE db
SET RECOVERY SIMPLE;
GO
-- Shrink the truncated log file to 1 MB.
DBCC SHRINKFILE (db_log, 1);
GO
-- Reset the database recovery model.
ALTER DATABASE db
SET RECOVERY FULL;
GO
I've also found great success in rebuilding indexes and defragmenting.
Tara Kizer posted this it has proven to help us out with DB performance: Thanks Tara Kizer if you read this!
-- required table
IF OBJECT_ID('DefragmentIndexes') IS NULL
CREATE TABLE DefragmentIndexes
(
DatabaseName nvarchar(100) NOT NULL,
SchemaName nvarchar(100) NOT NULL,
TableName nvarchar(100) NOT NULL,
IndexName nvarchar(100) NOT NULL,
DefragmentDate datetime NOT NULL,
PercentFragmented decimal(4, 2) NOT NULL,
CONSTRAINT PK_DefragmentIndexes PRIMARY KEY CLUSTERED
(
DatabaseName,
SchemaName,
TableName,
IndexName,
DefragmentDate
)
)
GO
IF OBJECT_ID(N'[dbo].[isp_ALTER_INDEX]') IS NOT NULL
DROP PROC [dbo].[isp_ALTER_INDEX]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-------------------------------------------------------------------------------------------
-- OBJECT NAME : isp_ALTER_INDEX
--
-- AUTHOR : Tara Kizer
--
-- INPUTS : @dbName - name of the database
-- @statsMode - LIMITED, SAMPLED or DETAILED
-- @defragType - REORGANIZE (INDEXDEFRAG) or REBUILD (DBREINDEX)
-- @minFragPercent - minimum fragmentation level
-- @maxFragPercent - maximum fragmentation level
-- @minRowCount - minimum row count
-- @logHistory - whether or not to log what got defragmented
-- @sortInTempdb - whether or not to sort the index in tempdb;
-- recommended if your tempdb is optimized (see BOL for details)
--
-- OUTPUTS : None
--
-- DEPENDENCIES : DefragmentIndexes, sys.dm_db_index_physical_stats, sys.objects, sys.schemas,
-- sys.indexes, sys.partitions, sys.indexes, sys.index_columns, INFORMATION_SCHEMA.COLUMNS
--
-- DESCRIPTION : Defragments indexes
/*
EXEC isp_ALTER_INDEX
@dbName = 'QHOSClient1',
@statsMode = 'SAMPLED',
@defragType = 'REBUILD',
@minFragPercent = 10,
@maxFragPercent = 100,
@minRowCount = 1000,
@logHistory = 1,
@sortInTempdb = 1
*/
/*
http://weblogs.sqlteam.com/tarad/archive/2009/03/27/Defragmenting-Indexes-in-SQL-Server-2005Again.aspx
Bug Fix - added SET QUOTED_IDENTIFIER ON to the script
Feature - added logging feature
http://weblogs.sqlteam.com/tarad/archive/2009/06/23/DefragmentingRebuilding-Indexes-in-SQL-Server-2005.aspx
Bug Fix - initialized @lobData to 0 for each pass through the loop
Bug Fix - checked for LOB data in included columns of non-clustered indexes
Feature - added SORT_IN_TEMPB option
http://weblogs.sqlteam.com/tarad/archive/2009/08/31/DefragmentingRebuilding-Indexes-in-SQL-server-2005-and-2008.aspx
Bug Fix - added index_level = 0 to sys.dm_db_index_physical_stats query
http://weblogs.sqlteam.com/tarad/archive/2009/11/03/DefragmentingRebuilding-Indexes-in-SQL-Server-2005-and-2008Again.aspx
Bug Fix - for SQL Server 2008, @indexType could be 'XML INDEX' or 'PRIMARY XML INDEX' for XML indexes
*/
-------------------------------------------------------------------------------------------
CREATE PROC [dbo].[isp_ALTER_INDEX]
(
@dbName sysname,
@statsMode varchar(8) = 'SAMPLED',
@defragType varchar(10) = 'REORGANIZE',
@minFragPercent int = 25,
@maxFragPercent int = 100,
@minRowCount int = 0,
@logHistory bit = 0,
@sortInTempdb bit = 0
)
AS
SET NOCOUNT ON
IF @statsMode NOT IN ('LIMITED', 'SAMPLED', 'DETAILED')
BEGIN
RAISERROR('@statsMode must be LIMITED, SAMPLED or DETAILED', 16, 1)
RETURN
END
IF @defragType NOT IN ('REORGANIZE', 'REBUILD')
BEGIN
RAISERROR('@defragType must be REORGANIZE or REBUILD', 16, 1)
RETURN
END
DECLARE
@i int, @objectId int, @objectName sysname, @indexId int, @indexName sysname,
@schemaName sysname, @partitionNumber int, @partitionCount int,
@sql nvarchar(4000), @edition int, @parmDef nvarchar(500), @allocUnitType nvarchar(60),
@indexType nvarchar(60), @online bit, @disabled bit, @dataType nvarchar(128),
@charMaxLen int, @allowPageLocks bit, @lobData bit, @fragPercent float
SELECT @edition = CONVERT(int, SERVERPROPERTY('EngineEdition'))
SELECT
IDENTITY(int, 1, 1) AS FragIndexId,
[object_id] AS ObjectId,
index_id AS IndexId,
avg_fragmentation_in_percent AS FragPercent,
record_count AS RecordCount,
partition_number AS PartitionNumber,
index_type_desc AS IndexType,
alloc_unit_type_desc AS AllocUnitType
INTO #FragIndex
FROM sys.dm_db_index_physical_stats (DB_ID(@dbName), NULL, NULL, NULL, @statsMode)
WHERE
avg_fragmentation_in_percent > @minFragPercent AND
avg_fragmentation_in_percent < @maxFragPercent AND
index_id > 0 AND
index_level = 0
ORDER BY ObjectId
-- LIMITED does not include data for record_count
IF @statsMode IN ('SAMPLED', 'DETAILED')
DELETE FROM #FragIndex
WHERE RecordCount < @minRowCount
SELECT @i = MIN(FragIndexId)
FROM #FragIndex
SELECT
@objectId = ObjectId,
@indexId = IndexId,
@fragPercent = FragPercent,
@partitionNumber = PartitionNumber,
@indexType = IndexType,
@allocUnitType = AllocUnitType
FROM #FragIndex
WHERE FragIndexId = @i
WHILE @@ROWCOUNT <> 0
BEGIN
-- get the table and schema names for the index
SET @sql = '
SELECT @objectName = o.[name], @schemaName = s.[name]
FROM ' + QUOTENAME(@dbName) + '.sys.objects o
JOIN ' + QUOTENAME(@dbName) + '.sys.schemas s
ON s.schema_id = o.schema_id
WHERE o.[object_id] = @objectId'
SET @parmDef = N'@objectId int, @objectName sysname OUTPUT, @schemaName sysname OUTPUT'
EXEC sp_executesql
@sql, @parmDef, @objectId = @objectId,
@objectName = @objectName OUTPUT, @schemaName = @schemaName OUTPUT
-- get index information
SET @sql = '
SELECT @indexName = [name], @disabled = is_disabled, @allowPageLocks = allow_page_locks
FROM ' + QUOTENAME(@dbName) + '.sys.indexes
WHERE [object_id] = @objectId AND index_id = @indexId'
SET @parmDef = N'
@objectId int, @indexId int, @indexName sysname OUTPUT,
@disabled bit OUTPUT, @allowPageLocks bit OUTPUT'
EXEC sp_executesql
@sql, @parmDef, @objectId = @objectId, @indexId = @indexId,
@indexName = @indexName OUTPUT, @disabled = @disabled OUTPUT,
@allowPageLocks = @allowPageLocks OUTPUT
SET @lobData = 0
-- for clustered indexes, check for columns in the table that use a LOB data type
IF @indexType = 'CLUSTERED INDEX'
BEGIN
-- CHARACTER_MAXIMUM_LENGTH column will equal -1 for max size or xml
SET @sql = '
SELECT @lobData = 1
FROM ' + QUOTENAME(@dbName) + '.INFORMATION_SCHEMA.COLUMNS c
WHERE TABLE_SCHEMA = @schemaName AND
TABLE_NAME = @objectName AND
(DATA_TYPE IN (''text'', ''ntext'', ''image'') OR
CHARACTER_MAXIMUM_LENGTH = -1)'
SET @parmDef = N'@schemaName sysname, @objectName sysname, @lobData bit OUTPUT'
EXEC sp_executesql
@sql, @parmDef, @schemaName = @schemaName, @objectName = @objectName,
@lobData = @lobData OUTPUT
END
-- for non-clustered indexes, check for LOB data type in the included columns
ELSE IF @indexType = 'NONCLUSTERED INDEX'
BEGIN
SET @sql = '
SELECT @lobData = 1
FROM ' + QUOTENAME(@dbName) + '.sys.indexes i
JOIN ' + QUOTENAME(@dbName) + '.sys.index_columns ic
ON i.object_id = ic.object_id
JOIN ' + QUOTENAME(@dbName) + '.INFORMATION_SCHEMA.COLUMNS c
ON ic.column_id = c.ORDINAL_POSITION
WHERE c.TABLE_SCHEMA = @schemaName AND
c.TABLE_NAME = @objectName AND
i.name = @indexName AND
ic.is_included_column = 1 AND
(c.DATA_TYPE IN (''text'', ''ntext'', ''image'') OR c.CHARACTER_MAXIMUM_LENGTH = -1)'
SET @parmDef = N'@schemaName sysname, @objectName sysname, @indexName sysname, @lobData bit OUTPUT'
EXEC sp_executesql
@sql, @parmDef, @schemaName = @schemaName, @objectName = @objectName,
@indexName = @indexName, @lobData = @lobData OUTPUT
END
-- get partition information for the index
SET @sql = '
SELECT @partitionCount = COUNT(*)
FROM ' + QUOTENAME(@dbName) + '.sys.partitions
WHERE [object_id] = @objectId AND index_id = @indexId'
SET @parmDef = N'@objectId int, @indexId int, @partitionCount int OUTPUT'
EXEC sp_executesql
@sql, @parmDef, @objectId = @objectId, @indexId = @indexId,
@partitionCount = @partitionCount OUTPUT
-- Developer and Enterprise have the ONLINE = ON option for REBUILD.
-- Indexes, including indexes on global temp tables, can be rebuilt online with the following exceptions:
-- disabled indexes, XML indexes, indexes on local temp tables, partitioned indexes,
-- clustered indexes if the underlying table contains LOB data types (text, ntext, image, varchar(max),
-- nvarchar(max), varbinary(max) or xml), and
-- nonclustered indexes that are defined with LOB data type columns.
-- When reoganizing and page locks is disabled for the index, we'll switch to rebuild later on,
-- so we need to get setup with the proper online option.
IF @edition = 3 AND (@defragType = 'REBUILD' OR (@defragType = 'REORGANIZE' AND @allowPageLocks = 0))
BEGIN
SET @online =
CASE
WHEN @indexType IN ('XML INDEX', 'PRIMARY XML INDEX') THEN 0
WHEN @indexType = 'NONCLUSTERED INDEX' AND @allocUnitType = 'LOB_DATA' THEN 0
WHEN @lobData = 1 THEN 0
WHEN @disabled = 1 THEN 0
WHEN @partitionCount > 1 THEN 0
ELSE 1
END
END
ELSE
SET @online = 0
-- build the ALTER INDEX statement
SET @sql = 'ALTER INDEX ' + QUOTENAME(@indexName) + ' ON ' + QUOTENAME(@dbName) + '.' +
QUOTENAME(@schemaName) + '.' + QUOTENAME(@objectName) +
CASE
WHEN @defragType = ' REORGANIZE' AND @allowPageLocks = 0 THEN ' REBUILD'
ELSE ' ' + @defragType
END
-- WITH options
IF @online = 1 OR @sortInTempdb = 1
BEGIN
SET @sql = @sql + ' WITH (' +
CASE
WHEN @online = 1 AND @sortInTempdb = 1 THEN 'ONLINE = ON, SORT_IN_TEMPDB = ON'
WHEN @online = 1 AND @sortInTempdb = 0 THEN 'ONLINE = ON'
WHEN @online = 0 AND @sortInTempdb = 1 THEN 'SORT_IN_TEMPDB = ON'
END + ')'
END
IF @partitionCount > 1 AND @disabled = 0 AND @indexType <> 'XML INDEX'
SET @sql = @sql + ' PARTITION = ' + CAST(@partitionNumber AS varchar(10))
-- run the ALTER INDEX statement
EXEC (@SQL)
-- log some information into a history table
IF @logHistory = 1
INSERT INTO DefragmentIndexes (DatabaseName, SchemaName, TableName, IndexName, DefragmentDate, PercentFragmented)
VALUES(@dbName, @schemaName, @objectName, @indexName, GETDATE(), @fragPercent)
SELECT @i = MIN(FragIndexId)
FROM #FragIndex
WHERE FragIndexId > @i
SELECT
@objectId = ObjectId,
@indexId = IndexId,
@fragPercent = FragPercent,
@partitionNumber = PartitionNumber,
@indexType = IndexType,
@allocUnitType = AllocUnitType
FROM #FragIndex
WHERE FragIndexId = @i
END
GO
The original post is here:
Aside from rebuilding the indexes and defragging the only other thing you can do is eliminate or get rid of data. If you've got int / bigints as PK's this will allow you to then reseed your PK using DBCC CHECKIDENT(tablename, value)
.
You could use ALTER INDEX ALL ON MyTable REBUILD
to rebuild your indexes on your table.
Upvotes: 3