disasterkid
disasterkid

Reputation: 7278

Split A SELECT query into several Xml files

This is a simplified version of my problem.

Using the following SELECT statement I output my data as Xml.

SET @result = ( 
    SELECT * FROM MyTable
    FOR XML path('receipts')
)

SELECT @result AS xmloutput

I then write the output into an Xml file that uses a timestamp in the filename:

SET @sqlCommand =
     'bcp "EXEC '  +@db + '.dbo.MyStoreProcedure" queryout "' + @filePath + @fileName + ' "  -T -C 1252 -w'
EXEC master..xp_cmdshell @sqlCommand

In the snippet above:

  1. MyStoreProcedure is basically the code from the first snippet i.e. the SELECT results.
  2. @filename has a structure like YYYYMMDDHHMMSS_customers.xml.

Now the problem is, the program that is supposed to read this Xml file has a limit of 20000 records. What I should do is split the results into separate Xml files. So if the original SELECT query gives 25000 records. They should be split into two non-overlapping files: First one containing 20000 and the second the remaining 5000.

To avoid overwriting on the same file (in case the process goes very fast), we can have a 1-second wait between each batch so that the next file will get a new name.

How can I implement this split?

Thanks in advance.

Upvotes: 2

Views: 1265

Answers (2)

Gottfried Lesigang
Gottfried Lesigang

Reputation: 67311

The function NTILE (Starting with SQL Server 2012) allows you to group your data. If you have a lower version you can easily simulate this with a combination of ROW_NUMBER and integer division.

The following writes your table rows together with a chunk number into a temp table. The WHILE takes chunk after chunk. You can use the chunk-number to add it to your file name.

The function WAITFOR (again 2012+) allows to pause automatically.

DECLARE @ApproxRowsPerChunk INT=10; --will float a little...

SELECT NTILE((SELECT Count(*) FROM sys.objects) / @ApproxRowsPerChunk) OVER(ORDER BY o.object_id) AS ChunkNumber
      ,* 
INTO #StagingTable
FROM sys.objects AS o;

DECLARE @nr INT=1;
DECLARE @maxNr INT=(SELECT MAX(ChunkNumber) FROM #StagingTable);

WHILE @nr<=@maxNr 
BEGIN
    SELECT * FROM #StagingTable WHERE ChunkNumber=@nr FOR XML PATH('test');
    SET @nr=@nr+1;
    WAITFOR DELAY '00:00:01';
END

Hint

This would allow you, to integrate something like 1 of 17 into your XML (and into the file's name if needed).

Upvotes: 2

TJB
TJB

Reputation: 877

please see if the below helps. I basically take your finished myTableresults, assign row_number() and divide them into partitions, I then loop through the partitions as per your batch size, executing your bulk copy in the loop. You may have some more work to do on dynamically setting the file names (you could use the @current_partition variable in your file name builder).

I have commented my code:

-- Dummy data, represents your results set

IF OBJECT_ID('tempdb..#t1') IS NOT NULL
    DROP TABLE #t1

CREATE TABLE #t1 (
    initials VARCHAR(10)
    ,no_cars VARCHAR(10)
    )

INSERT INTO #t1
VALUES
('AA',1)
,('BB',1)
,('CC',1)
,('DD',1)
,('EE',1)
,('FF',1)
,('GG',1)
,('HH',1)
,('II',1)
,('JJ',1)
,('KK',1)

---- end of test data creation

-- Assign query partition size, in your case would be 20,000. Must be float (or other decimal).

DECLARE @partition_size FLOAT = 3;

IF OBJECT_ID('tempdb..#t2') IS NOT NULL
    DROP TABLE #t2;

-- Assign your results set a row number
WITH [cte1] AS (
SELECT
initials
,no_cars
,ROW_NUMBER() OVER (ORDER BY no_cars ASC) AS row_no
FROM #t1
)
-- Assign the query partition by running a ceiling command on the row number, store the results in a temp table
SELECT
initials
,no_cars
,CEILING(row_no/@partition_size) AS query_partition
INTO #t2
FROM cte1

--- Now, create a loop to go through each partition

-- Your business variables

DECLARE @result XML
DECLARE @sqlcommand NVARCHAR(4000)
DECLARE @db VARCHAR(50) = 'db'
DECLARE @filepath VARCHAR(50) = 'C:\temp'
DECLARE @filename VARCHAR(50) = 'dynamic2017010.xml'

-- Find highest partition
DECLARE @current_partition INT = 1
DECLARE @max_partition INT = (SELECT MAX(query_partition) FROM #t2)


WHILE @current_partition <= @max_partition
BEGIN

    SET @result = ( 
        SELECT initials
,no_cars FROM #t2
        WHERE query_partition = @current_partition
        FOR XML path('receipts')
    )
SELECT @result AS xmloutput

-- other code..?

    SET @sqlCommand =
         'bcp "EXEC '  +@db + '.dbo.MyStoreProcedure" queryout "' + @filePath + @fileName + ' "  -T -C 1252 -w'
    EXEC master..xp_cmdshell @sqlCommand

    SET @current_partition += 1

END

Upvotes: 1

Related Questions