Reputation: 33
I have a requirement of exporting the table data into a flat file (.csv format) with each of the files containing only 5000 records each. If the table has 10000 records then it has to create 2 files with 5000 records each. The records in the table will increase on daily basis. So basically I am looking for a dynamic solution which will export "n" number of records to "n" number of files with only 5000 records each.
*A simple visualization: Assume the table has 10230 records. What i need is:
File_1.csv - 1 to 5000 records
File_2.csv - 5001 to 10000 records
File_3.csv - 10001 to 10230 records*
I have tried BCP command for the above mentioned logic. Can this be done using Data Flow Task?
Upvotes: 1
Views: 4624
Reputation: 5246
Well, it can be done with standard SSIS components ans SQL 2012+. Idea is the following - use SELECT ... ORDER BY ... OFFSET <Row offset> ROWS FETCH NEXT <Row number> ROWS
as bucket source and use it together with FOR container and Flat File Destination with expressions.
More details:
"SELECT count(*) FROM ... ORDER BY ... OFFSET "+(DT_WSTR,20)[User::Iterator]*[User::Bucket_Size]+" ROWS "
. This command gives you remaining rows count in the current bucket. "SELECT .. FROM ... ORDER BY ... OFFSET "+(DT_WSTR,20)[User::Iterator]*[User::Bucket_Size]+" ROWS FETCH NEXT "+(DT_WSTR,20)[User::Bucket_Size]+" ROWS"
.This is it.
You can optimize it if the source table is not modified during export; first (before For Loop) fetch number of rows and figure out number of buckets, and do this number of iterations. Thus you avoid repeating select count(*) statements in the loop.
Upvotes: 0
Reputation: 61201
No, that is not something SSIS is going to support well natively.
A Script Task, or Script Component acting as a destination, could accomplish this but you'd be re-inventing a significant portion of the wheel with all the file handling required.
The first step would be to add a row number to all the rows coming from the source in a repeatable fashion. That could be as simple as SELECT *, ROW_NUMBER() OVER (ORDER BY MyTablesKey) AS RN FROM dbo.MyTable
Now that you have a monotonically increasing value associated to each row, you could use the referenced answer to be able to pull the data in a given range if you take the ForEach approach.
If you could make a reasonable upper bounds on how many buckets/files of data you'd ever have, then you could use some of the analytic functions to specify the size of your groupings. Then all of the data is fed into the data flow and you have a conditional split that has that upper bounds worth of output buffers heading to flat file destinations.
An alternative approach would be to export the file as is and then use something like PowerShell to split it up into smaller units. Unix is nice as they have split as a native method for just this sort of thing.
Upvotes: 1