Reputation: 129
I have C# codes to Remove Non-ASCII Chars in incoming text file and then out put to a .NonAsciiChars text file. because the incoming file is in XML format and the return method could be LF ONLY or CRLF, that's why I am not doing the replacement line by line (I am using StreamReader.ReadToEnd())
Now the problem is when the incoming file is huge (around 2 GB) size, I am getting the below error. is there any better way to do the Remove Non-ASCII Chars in my Case? the incoming file also will send in around 4GB, I afraid on that time, the reading part also will get the OutOfMemoryException.
Thanks a lot.
DateTime:2014-08-04 12:55:26,035 Thread ID:[1] Log Level:ERROR Logger Property:OS_fileParser.Program property:[(null)] - Message:System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount)
at System.Text.StringBuilder.Append(Char* value, Int32 valueCount)
at System.Text.StringBuilder.Append(Char[] value, Int32 startIndex, Int32 charCount)
at System.IO.StreamReader.ReadToEnd()
at OS_fileParser.MyProgram.FormatXmlFile(String inFile) in D:\Test\myProgram.cs:line 530
at OS_fileParser.MyProgram.Run() in D:\Test\myProgram.cs:line 336
myProgram.cs line 530: content = Regex.Replace(content, pattern, "");
myProgram.cs line 336: which is the point call the following method
const string pattern = @"[^\x20-\x7E]";
string content;
using (var reader = new StreamReader(inFile))
{
content = reader.ReadToEnd();
reader.Close();
}
content = Regex.Replace(content, pattern, "");
using (var writer = new StreamWriter(inFile + ".NonAsciiChars"))
{
writer.Write(content);
writer.Close();
}
using (var myXmlReader = XmlReader.Create(inFile + ".NonAsciiChars", myXmlReaderSettings))
{
try
{
while (myXmlReader.Read())
{
}
}
catch (XmlException ex)
{
Logger.Error("Validation error: " + ex);
}
}
Upvotes: 4
Views: 413
Reputation: 21999
You are getting OutOfMemoryException
. To conserve memory, you can process file by portions, here is a good example of how to process file line by line and here is by bytes, using buffer (reading by 1 byte is slow).
In simplest case it's like this:
string line;
using (var reader = new StreamReader(inFile))
using (var writer = new StreamWriter(inFile + ".NonAsciiChars"))
while ((line = reader.ReadLine()) != null)
{
... // code to process line
writer.Write(line);
}
Upvotes: 3