Reputation: 628
I need to create a package which splits the huge flat file into multiple flat files.
I have flat file which has 20 million rows and now I need to split this flat file (Each flat file needs to have 55 k rows)
Example:- if there are 111 rows in total, I would have to create 3 files.
file1.txt will have 1-55 rows file2.txt will have 55-110 rows file3.txt will have 1 rows .
What options do I have?
I am using Visual Studio 2012 for this project.
Upvotes: 0
Views: 1052
Reputation: 772
you could try something like this... its pretty rudimentary and I am sure that someone will point out that it is not going to be the most efficient thing but its a solid option. Note that you will need to add some try catch error handling.
int recper = 0; // this is where you will assign the number of records per file
int reccount = 0;
int filecount = 1;
string filename = "testfilename";
string networkDirectory = @"c:\fakepath\";
string fileToRead = @"c:\fakepath\textfile.txt";
using (StreamReader reader = new StreamReader(fileToRead,Encoding.Default,true))
{
while (reader.Peek() > 0)
{
using (StreamWriter writer = new StreamWriter(Path.Combine(networkDirectory, filename + filecount + ".txt"), true, Encoding.Default))
{
writer.Write(reader.ReadLine());
}
reccount++;
// checks on each iteration of the while loop to see if the
// current record count matches the number of records per file
// if sso reset reccount and change increment filecount to change the file name
if (reccount == recper)
{
reccount = 0;
filecount++;
}
}
}
Upvotes: 1
Reputation: 31775
Another way you can do this is in a dataflow:
First use your method of choice to add a "Row Number" column to your data flow (unless there is already one in your flat file output, in which case skip this step and use that):
Then add a MultiCast transformation to your dataflow, and use the row number to split the stream and send it to different destinations:
Row 1 - 55k, -> File1
Row 55001 - 110k -> File2
etc.
Upvotes: 1