Amit
Amit

Reputation: 9

Issue with ALTER Partition SWITCH to load data

I have table which is partitioned by acquisition date

Table Structure (Table Name : Inventory)

Vendor_Name [varchar](80) NULL,
Model_Name [varchar](80) NULL,
AcquisitionDate [datetime] NOT NULL,
Qty [bigint] NOT NULL

Partition Function For Inventory Table :

CREATE PARTITION FUNCTION [Inventory_PF_New](datetime) AS RANGE LEFT FOR VALUES 
(N'2012-07-01T00:00:00.000', N'2012-08-01T00:00:00.000', 
N'2012-09-01T00:00:00.000', N'2012-10-01T00:00:00.000', 
N'2012-11-01T00:00:00.000', N'2012-12-01T00:00:00.000', 
N'2013-01-01T00:00:00.000', N'2013-02-01T00:00:00.000', 
N'2013-03-01T00:00:00.000', N'2013-04-01T00:00:00.000', 
N'2013-05-01T00:00:00.000', N'2013-06-01T00:00:00.000', 
N'2013-07-01T00:00:00.000', N'2013-08-01T00:00:00.000', 
N'2013-09-01T00:00:00.000', N'2013-10-01T00:00:00.000', 
N'2013-11-01T00:00:00.000', N'2013-12-01T00:00:00.000', 
N'2014-01-01T00:00:00.000', N'2014-02-01T00:00:00.000', 
N'2014-03-01T00:00:00.000', N'2014-04-01T00:00:00.000', 
)

I receive daily feed flat files which can have acquisition date up to 3 months older. I need to process these files and load data to actual partition table (Inventory Table).

Using Bulk Insert/SSIS I can simply load these files into temporary table but How Do I use ALTER SWITCH to SWITCH daily received data into actual partition table? I know The receiving partition must be empty and when I load data into temporary table I could have three months of data. So, I can't simply perform ALTER SWITCH IN.

Any thoughts How do I implement this. My requirement is to load daily received data quickly and my daily feed can have three months of data?

My table is partitioned by month using acquisition date and I receive data feed which can have 3 to 4 months of data. How do I use ALTER SWITCH to load this data into actual partition table?

Upvotes: 1

Views: 6321

Answers (2)

Avarkx
Avarkx

Reputation: 1065

I guess the question I have for you is "are you sure the table is partitioned?" Generally speaking, whether or not you are staging your data to a temporary location, inserts into a properly partitioned table should correctly align the data along the primary key, used in the partition function.

The example below creates a catalog with a simple partition and inserts records into a partitioned table ( based loosely around the structure you've divulged, as a partitioned table needs to have it's partitioned column at least part of the primary key, which has not been displayed in the original example ). The results provided show the growth of just the expected files ( 20140101 and 20140102 ) after a single insert and finally, the actual allocation of those rows in the partitions. The latter queries may help you along determining if your data is being properly allocated and if not, you may be able to glean some insight on how to fix that from the rest of the example!

Code below modified to show the SWITCH usage as outlined in the comments below:

Create a Test catalog:

USE master;
GO

-- Make a data directory;
-- EXECUTE xp_cmdshell 'mkdir D:\MSSQL\DATA\PartitionExample\';

-- Create a catalog for testing;
IF NOT EXISTS ( SELECT  1
                FROM    sys.databases
                WHERE   name = 'PartitionExample' )
BEGIN
    --DROP DATABASE [PartitionExample];
    CREATE DATABASE [PartitionExample]
    ON PRIMARY (
        NAME = [PartitionExample_dat],
        FILENAME = 'D:\MSSQL\DATA\PartitionExample\PartitionExample.mdf',
        SIZE = 3MB,
        FILEGROWTH = 1MB )
    LOG ON (
        NAME = [PartitionExample_log],
        FILENAME = 'D:\MSSQL\DATA\PartitionExample\PartitionExample.ldf',
        SIZE = 512KB, 
        FILEGROWTH = 512KB );
END;
GO

-- Add filegroups and files for partitions;
IF NOT EXISTS ( SELECT *
                FROM    PartitionExample.sys.sysfiles
                WHERE   name LIKE 'fg[_]DateRange[_]________[_]dat' )
BEGIN
    -- Filegroups
    ALTER DATABASE [PartitionExample]
    ADD FILEGROUP [fg_DateRange_20140101];

    ALTER DATABASE [PartitionExample]
    ADD FILEGROUP [fg_DateRange_20140102];

    ALTER DATABASE [PartitionExample]
    ADD FILEGROUP [fg_DateRange_20140103];

    -- Files
    ALTER DATABASE [PartitionExample]
    ADD FILE (
        NAME = [fg_DateRange_20140101_dat],
        FILENAME = 'D:\MSSQL\DATA\PartitionExample\fg_DateRange_20140101.ndf',
        SIZE = 512KB,
        FILEGROWTH = 512KB )
        TO FILEGROUP [fg_DateRange_20140101];

    ALTER DATABASE PartitionExample
    ADD FILE (
        NAME = [fg_DateRange_20140102_dat],
        FILENAME = 'D:\MSSQL\DATA\PartitionExample\fg_DateRange_20140102.ndf',
        SIZE = 512KB,
        FILEGROWTH = 512KB )
        TO FILEGROUP [fg_DateRange_20140102];

    ALTER DATABASE PartitionExample
    ADD FILE (
        NAME = [fg_DateRange_20140103_dat],
        FILENAME = 'D:\MSSQL\DATA\PartitionExample\fg_DateRange_20140103.ndf',
        SIZE = 512KB,
        FILEGROWTH = 512KB )
        TO FILEGROUP [fg_DateRange_20140103];
END;
GO

Build Partition Scheme / Function

-- Use the newly created catalog;
USE [PartitionExample];
GO

-- Set up partition function and scheme;
IF NOT EXISTS ( SELECT  1
                FROM    sys.partition_functions
                WHERE   name = 'pf_DateRange' )
BEGIN
    --DROP PARTITION SCHEME [ps_DateRange];
    --DROP PARTITION FUNCTION [pf_DateRange];
    CREATE PARTITION FUNCTION [pf_DateRange] ( DATETIME )
    AS RANGE RIGHT FOR VALUES ( '20140101', '20140102', '20140103' );

    CREATE PARTITION SCHEME [ps_DateRange]
    AS PARTITION [pf_DateRange] TO ( [PRIMARY], 
        [fg_DateRange_20140101], [fg_DateRange_20140102], [fg_DateRange_20140103] );
END;
GO

Create Partitioned Table and Populate:

-- Create table;
IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'Inventory'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.Inventory;
    CREATE TABLE dbo.Inventory
    (
        Inventory_PK        INTEGER IDENTITY( 1, 1 ) NOT NULL,
        AcquisitionDate     DATETIME NOT NULL,
                                PRIMARY KEY ( Inventory_PK, AcquisitionDate ),
        Vendor_Name         VARCHAR( 80 ) NULL,
        Model_Name          VARCHAR( 80 ) NULL,
        Qty                 BIGINT NOT NULL
    ) ON ps_DateRange( AcquisitionDate );
END;
GO

-- "Stage" data, for initial population purposes
SET NOCOUNT ON;

IF NOT EXISTS ( SELECT  1
                FROM    dbo.Inventory )
BEGIN
    DECLARE @i      INTEGER;

    CREATE TABLE dbo.t_StageInventory
    (
        Inventory_PK        INTEGER IDENTITY( 1, 1 ) NOT NULL,
        AcquisitionDate     DATETIME NOT NULL,
                                PRIMARY KEY ( Inventory_PK, AcquisitionDate ),
        Vendor_Name         VARCHAR( 80 ) NULL,
        Model_Name          VARCHAR( 80 ) NULL,
        Qty                 BIGINT NOT NULL
    ) ON ps_DateRange( AcquisitionDate );

        SET @i = 0;
    WHILE ( @i < 100 )
    BEGIN
        INSERT INTO dbo.t_StageInventory ( Vendor_Name, Model_Name, 
            AcquisitionDate, Qty )
        VALUES ( 'VendorName', 'ModelName', '20140101', 1 );
        SET @i = @i + 1;
    END;

    SET @i = 0;
    WHILE ( @i < 100 )
    BEGIN
        INSERT INTO dbo.t_StageInventory ( Vendor_Name, Model_Name, 
            AcquisitionDate, Qty )
        VALUES ( 'VendorName', 'ModelName', '20140102', 1 );
        SET @i = @i + 1;
    END;

    -- Insert data into the partitioned table;
    INSERT INTO dbo.Inventory ( AcquisitionDate, Vendor_Name, 
        Model_Name, Qty )
    SELECT  AcquisitionDate, Vendor_Name, Model_Name, Qty
    FROM    dbo.t_StageInventory;

    DROP TABLE dbo.t_StageInventory;

    SET NOCOUNT OFF;
END;
GO

View Distribution of Data:

SELECT  ObjectName = OBJECT_NAME( p.object_id ),
        PartitionSchemeName = ps.name,
        PartitionFunctionName = pf.name,
        PartitionNumber = p.partition_number,
        FileGroup = fg.name,
        Rows = p.rows
FROM    sys.partitions p
INNER JOIN sys.indexes i
    ON  p.object_id = i.object_id
INNER JOIN sys.partition_schemes ps
    ON  i.data_space_id = ps.data_space_id
INNER JOIN sys.partition_functions pf
    ON  ps.function_id = pf.function_id
INNER JOIN sys.destination_data_spaces dds
    ON  dds.partition_scheme_id = ps.data_space_id
    AND dds.destination_id = p.partition_number
INNER JOIN sys.filegroups fg
    ON  dds.data_space_id = fg.data_space_id
WHERE   p.object_id = OBJECT_ID( 'Inventory' );

Now that the test is set up, we can move on to getting ready for the SWITCH! The staging table is once again populated:

New Stage:

-- "Stage" data from ETL
SET NOCOUNT ON;

DECLARE @i      INTEGER;

CREATE TABLE dbo.t_StageInventory
(
    Inventory_PK        INTEGER IDENTITY( 1, 1 ) NOT NULL,
    AcquisitionDate     DATETIME NOT NULL,
                            PRIMARY KEY ( Inventory_PK, AcquisitionDate ),
    Vendor_Name         VARCHAR( 80 ) NULL,
    Model_Name          VARCHAR( 80 ) NULL,
    Qty                 BIGINT NOT NULL
) ON ps_DateRange( AcquisitionDate );

    SET @i = 0;
WHILE ( @i < 10 )
BEGIN
    INSERT INTO dbo.t_StageInventory ( Vendor_Name, Model_Name, 
        AcquisitionDate, Qty )
    VALUES ( 'VendorName', 'ModelName', '20140102', 1 );
    SET @i = @i + 1;
END;

SET @i = 0;
WHILE ( @i < 100 )
BEGIN
    INSERT INTO dbo.t_StageInventory ( Vendor_Name, Model_Name, 
        AcquisitionDate, Qty )
    VALUES ( 'VendorName', 'ModelName', '20140103', 1 );
    SET @i = @i + 1;
END;

With the staging table populated, we need to determine what rows must be transferred from the report-stable table and move them.

Re-Staging Data:

-- Re-stage existing partition data;
DECLARE @UpperBound     DATETIME,
        @LowRange       DATETIME,
        @HighRange      DATETIME;
    SET @UpperBound = '99991231';

SELECT  @LowRange = MIN( CAST( pprv.value AS DATETIME ) ), 
        @HighRange = MAX( ISNULL( CAST( prv.value AS DATETIME ), @UpperBound ) )
FROM    sys.partitions p
INNER JOIN sys.indexes i
    ON  p.object_id = i.object_id
INNER JOIN sys.partition_schemes ps
    ON  i.data_space_id = ps.data_space_id
INNER JOIN sys.partition_functions pf
    ON  ps.function_id = pf.function_id
INNER JOIN sys.destination_data_spaces dds
    ON  dds.partition_scheme_id = ps.data_space_id
    AND dds.destination_id = p.partition_number
INNER JOIN sys.filegroups fg
    ON  dds.data_space_id = fg.data_space_id
LEFT JOIN sys.partition_range_values prv
    ON  ps.function_id = prv.function_id
    AND p.partition_number = prv.boundary_id
LEFT JOIN sys.partition_range_values pprv
    ON  ps.function_id = prv.function_id
    AND p.partition_number - 1 = pprv.boundary_id
WHERE   p.object_id = OBJECT_ID( 't_StageInventory' )
    AND rows <> 0;

INSERT INTO dbo.t_StageInventory( AcquisitionDate, Vendor_Name, Model_Name, Qty )
SELECT  AcquisitionDate, Vendor_Name, Model_Name, Qty
FROM    dbo.Inventory
WHERE   AcquisitionDate >= @LowRange
    AND AcquisitionDate < @HighRange;

SWITCH Out, then In

CREATE TABLE dbo.t_SwapInventory
(
    Inventory_PK        INTEGER IDENTITY( 1, 1 ) NOT NULL,
    AcquisitionDate     DATETIME NOT NULL,
                            PRIMARY KEY ( Inventory_PK, AcquisitionDate ),
    Vendor_Name         VARCHAR( 80 ) NULL,
    Model_Name          VARCHAR( 80 ) NULL,
    Qty                 BIGINT NOT NULL
) ON ps_DateRange( AcquisitionDate );

-- Dynamic here...
DECLARE @t_Partition TABLE
(
    partition_number    INTEGER
);

INSERT INTO @t_Partition ( partition_number )
SELECT  DISTINCT p.partition_number
FROM    sys.partitions p
WHERE   p.object_id = OBJECT_ID( 't_StageInventory' )
    AND p.rows <> 0;
    SET @i = @@ROWCOUNT;

DECLARE @SQL            NVARCHAR( MAX ),
        @Partition      INTEGER;
WHILE ( @i > 0 )
BEGIN
    SELECT  TOP 1 @Partition = partition_number
    FROM    @t_Partition;

    DELETE  @t_Partition
    WHERE   partition_number = @Partition;

    SET @SQL = N'
        ALTER TABLE dbo.Inventory
        SWITCH PARTITION ' + LEFT( @Partition, 1024 ) + '
            TO dbo.t_SwapInventory PARTITION ' + LEFT( @Partition, 1024 ) + ';';
    EXECUTE dbo.sp_executesql @statement = @SQL;

    SET @SQL = N'
        ALTER TABLE dbo.t_StageInventory
        SWITCH PARTITION ' + LEFT( @Partition, 1024 ) + '
            TO dbo.Inventory PARTITION ' + LEFT( @Partition, 1024 ) + ';';
    EXECUTE dbo.sp_executesql @statement = @SQL;

    SET @i = @i - 1;
END;

GO
DROP TABLE dbo.t_SwapInventory;

DROP TABLE dbo.t_StageInventory;

SET NOCOUNT OFF;

As this point, the View Distribution of Data query above could be run for additional verification.

Upvotes: 1

Stuart Ainsworth
Stuart Ainsworth

Reputation: 12940

I'm not sure if this question and answer is a great fit for stackoverflow, because this is getting into a relatively complex architectural discussion, but I'll try to keep the ideas simple and let you run with it.

I'm assuming that the reason that you are loading into a temporary table from SSIS is so that you can take advantage of a BULK insert; if you're just doing this load once a day, it may not be worth trying to do my suggestion. Just insert into your target table; the time to complete the insert may be worth the cost of maintaining a more complex situation. However, if you need to do this load and have your data immediately available, then I would do it like so:

  1. SSIS package to staging table (done!)
  2. Switch Staging table (ALTER SWITCH) to "Most Recent" table, which is partitioned using the same scheme as your final destination table (monthly), but is empty at the time of load. UNION ALL that table to your final destination table in a view, and make sure your app references that view for all data retrieval statements.
  3. Create a job that trickles data from the "Most Recent" table to the "Final" table. Keep the transactions of INSERT and DELETE's small and tight, but fast enough that they can complete before the next daily run.

This should allow you to have the benefits of BULK INSERTS and near-immediate access to your data. The trickle load from the Most Recent to the Final table will slowly empty out the destination table for the ALTER SWITCH maneuver.

If there's a possibility that the trickle load will take longer than a day, you could actually build a partitioned view dynamically, adding a new partitioned table for each insert date, and dropping those tables as they get trickled to the final destination, but that's adding a LOT of complexity.

Upvotes: 1

Related Questions