Reputation: 488
I have to merge thousands of large files (~200MB each). I would like to know what is the best way to merge this files. Lines will be conditionally copied to the merged file. Could it by using File.AppendAllLines or using Stream.CopyTo?
Using File.AppendAllLines
for (int i = 0; i < countryFiles.Length; i++){
string srcFileName = countryFiles[i];
string[] countryExtractLines = File.ReadAllLines(srcFileName);
File.AppendAllLines(actualMergedFileName, countryExtractLines);
}
Using Stream.CopyTo
using (Stream destStream = File.OpenWrite(actualMergedFileName)){
foreach (string srcFileName in countryFiles){
using (Stream srcStream = File.OpenRead(srcFileName)){
srcStream.CopyTo(destStream);
}
}
}
Upvotes: 5
Views: 2624
Reputation: 109567
Suppose you have a condition which must be true (i.e. a predicate) for each line in one file that you want to append to another file.
You can efficiently process that as follows:
var filteredLines =
File.ReadLines("MySourceFileName")
.Where(line => line.Contains("Target")); // Put your own condition here.
File.AppendAllLines("MyDestinationFileName", filteredLines);
This approach scales to multiple files and avoids loading the entire file into memory.
If instead of appending all the lines to a file, you wanted to replace the contents, you'd do:
File.WriteAllLines("MyDestinationFileName", filteredLines);
instead of
File.AppendAllLines("MyDestinationFileName", filteredLines);
Also note that there are overloads of these methods that allow you to specify the encoding, if you are not using UTF8.
Finally, don't be thrown by the inconsistent method naming.File.ReadLines()
does not read all lines into memory, but File.ReadAllLines()
does. However, File.WriteAllLines()
does NOT buffer all lines into memory, or expect them to all be buffered in memory; it uses IEnumerable<string>
for the input.
Upvotes: 2
Reputation: 38094
You can write the files one after the other. For example:
static void MergingFiles(string outputFile, params string[] inputTxtDocs)
{
using (Stream outputStream = File.OpenWrite(outputFile))
{
foreach (string inputFile in inputTxtDocs)
{
using (Stream inputStream = File.OpenRead(inputFile))
{
inputStream.CopyTo(outputStream);
}
}
}
}
In my view the above code is really high-performance as Stream.CopyTo() has really very simple algorithm so the method is high effective. The reflector renders the heart of it as follows:
private void InternalCopyTo(Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = this.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
Upvotes: 5
Reputation: 41
sab669's answer is correct, you want to use a StreamReader then loop over each line of the file... I would suggest writing each file individually however as otherwise you are going to run out of memory pretty quickly with many 200mb files
For example:
foreach(File f in files)
{
List<String> lines = new List<String>();
string line;
int cnt = 0;
using(StreamReader reader = new StreamReader(f)) {
while((line = reader.ReadLine()) != null) {
// TODO : Put your conditions in here
lines.Add(line);
cnt++;
}
}
f.Close();
// TODO : Append your lines here using StreamWriter
}
Upvotes: 3