intel0201
intel0201

Reputation: 33

SSIS - Export table data to flat file in chunks

I have a requirement of exporting the table data into a flat file (.csv format) with each of the files containing only 5000 records each. If the table has 10000 records then it has to create 2 files with 5000 records each. The records in the table will increase on daily basis. So basically I am looking for a dynamic solution which will export "n" number of records to "n" number of files with only 5000 records each.

*A simple visualization: Assume the table has 10230 records. What i need is:

File_1.csv - 1 to 5000 records

File_2.csv - 5001 to 10000 records

File_3.csv - 10001 to 10230 records*

I have tried BCP command for the above mentioned logic. Can this be done using Data Flow Task?

Upvotes: 1

Views: 4624

Answers (2)

Ferdipux
Ferdipux

Reputation: 5246

Well, it can be done with standard SSIS components ans SQL 2012+. Idea is the following - use SELECT ... ORDER BY ... OFFSET <Row offset> ROWS FETCH NEXT <Row number> ROWS as bucket source and use it together with FOR container and Flat File Destination with expressions.
More details:

  1. Create package with Iterator int variable with init value of 0 and Flat File Destination where connection string is defined as an Expression of `"\Filename_"+[User::Iterator]+".csv". Also define Bucket_size variable or parameter as int
  2. Create For loop sequence container. Leave its parameters empty for now. Next steps will be inside For Loop.
  3. On Loop container (or Package level - up to you) create SQL_rowcount variable with "SELECT count(*) FROM ... ORDER BY ... OFFSET "+(DT_WSTR,20)[User::Iterator]*[User::Bucket_Size]+" ROWS ". This command gives you remaining rows count in the current bucket.
  4. Create Task Execute SQL Command with command from SQL_rowcount variable and storing single result into a variable Bucket_Rowcount.
  5. Create a string variable SQL_bucket with expression "SELECT .. FROM ... ORDER BY ... OFFSET "+(DT_WSTR,20)[User::Iterator]*[User::Bucket_Size]+" ROWS FETCH NEXT "+(DT_WSTR,20)[User::Bucket_Size]+" ROWS".
  6. Create a simple dataflow task - OLEDB Source with command from SQL_bucket variable and Flat File destination from step 1.
  7. Now little trick - we have to define loop conditions. We do it based on current bucket rowcount - last bucket has no more than Bucket size rows. Continuation condition (checked before loop entry) - last iteration has more than Bucket rows (at least 1 row left for the next iteration).
    So, define the following properties for For Loop Container
    • InitExpression - @Bucket_Rowcount = @Bucket_Size + 1
    • EvalExpression - @Bucket_Rowcount > @Bucket_Size
    • AssignExpression - @Iterator = @Iterator + 1

This is it.
You can optimize it if the source table is not modified during export; first (before For Loop) fetch number of rows and figure out number of buckets, and do this number of iterations. Thus you avoid repeating select count(*) statements in the loop.

Upvotes: 0

billinkc
billinkc

Reputation: 61201

No, that is not something SSIS is going to support well natively.

A Script Task, or Script Component acting as a destination, could accomplish this but you'd be re-inventing a significant portion of the wheel with all the file handling required.

The first step would be to add a row number to all the rows coming from the source in a repeatable fashion. That could be as simple as SELECT *, ROW_NUMBER() OVER (ORDER BY MyTablesKey) AS RN FROM dbo.MyTable

Now that you have a monotonically increasing value associated to each row, you could use the referenced answer to be able to pull the data in a given range if you take the ForEach approach.

If you could make a reasonable upper bounds on how many buckets/files of data you'd ever have, then you could use some of the analytic functions to specify the size of your groupings. Then all of the data is fed into the data flow and you have a conditional split that has that upper bounds worth of output buffers heading to flat file destinations.

An alternative approach would be to export the file as is and then use something like PowerShell to split it up into smaller units. Unix is nice as they have split as a native method for just this sort of thing.

Upvotes: 1

Related Questions