Reputation: 453
Please, help me resolve this issue. I have a huge input.txt. Now it's 465 Mb, but later it will be 1Gb at least.
User enters a term (not a whole word). Using that term I need to find a word that contains it, put it between <strong>
tags and save the contents to the output.txt. The term-search should be case insensitive.
This is what I have so far. It works on small texts, but doesn't on bigger ones.
Regex regex = new Regex(" ");
string text = File.ReadAllText("input.txt");
Console.WriteLine("Please, enter a term to search for");
string term = Console.ReadLine();
string[] w = regex.Split(text);
for (int i = 0; i < w.Length; i++)
{
if (Processor.Contains(w[i], term, StringComparison.OrdinalIgnoreCase))
{
w[i] = @"<strong>" + w[i] + @"</string>";
}
}
string result = null;
result = string.Join(" ", w);
File.WriteAllText("output.txt", result);
Upvotes: 3
Views: 1840
Reputation: 186708
Try not to load the entire file into memory, avoid huge GB-size arrays, String
s etc. (you may just not have enough RAM). Can you process the file line by line (i.e. you don't have multiline terms, do you?)? If it's your case then
...
var source = File
.ReadLines("input.txt") // Notice absence of "All", not ReadAllLines
.Select(line => line.Split(' ')) // You don't need Regex here, just Split
.Select(items => items
.Select(item => String.Equals(item, term, StringComparison.OrdinalIgnoreCase)
? @"<strong>" + term + @"</strong>"
: item))
.Select(items => String.Join(" ", items));
File.WriteAllLines("output.txt", source);
Upvotes: 3
Reputation: 332
Read the file line by line (or buffer more lines). A bit slower but should work.
Also there can be a problem if all the lines match your term. Consider writing results in a temporary file when you find them and then just rename/move the file to the destination folder.
Upvotes: 1
Reputation: 1036
Trying to read the entire file in one go is causing your memory exception. Look into reading the file in stages. The FileStream and BufferedStream classes provide ways of doing this:
https://msdn.microsoft.com/en-us/library/system.io.filestream(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/system.io.bufferedstream.read(v=vs.110).aspx
Upvotes: 5