ShHolmes
ShHolmes

Reputation: 453

Working on huge text file, C#. Modifying the file

Please, help me resolve this issue. I have a huge input.txt. Now it's 465 Mb, but later it will be 1Gb at least.

User enters a term (not a whole word). Using that term I need to find a word that contains it, put it between <strong> tags and save the contents to the output.txt. The term-search should be case insensitive.

This is what I have so far. It works on small texts, but doesn't on bigger ones.

Regex regex = new Regex(" "); 

string text = File.ReadAllText("input.txt"); 
Console.WriteLine("Please, enter a term to search for"); 
string term = Console.ReadLine(); 

string[] w = regex.Split(text); 

for (int i = 0; i < w.Length; i++) 
{ 
    if (Processor.Contains(w[i], term, StringComparison.OrdinalIgnoreCase)) 
    { 
        w[i] = @"<strong>" + w[i] + @"</string>"; 
    } 
} 

string result = null; 
result = string.Join(" ", w); 

File.WriteAllText("output.txt", result);

Upvotes: 3

Views: 1840

Answers (3)

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186708

Try not to load the entire file into memory, avoid huge GB-size arrays, Strings etc. (you may just not have enough RAM). Can you process the file line by line (i.e. you don't have multiline terms, do you?)? If it's your case then

  ...
  var source = File
    .ReadLines("input.txt") // Notice absence of "All", not ReadAllLines
    .Select(line => line.Split(' ')) // You don't need Regex here, just Split 
    .Select(items => items
      .Select(item => String.Equals(item, term, StringComparison.OrdinalIgnoreCase) 
         ? @"<strong>" + term + @"</strong>" 
         : item))
    .Select(items => String.Join(" ", items));

  File.WriteAllLines("output.txt", source);

Upvotes: 3

RokX
RokX

Reputation: 332

Read the file line by line (or buffer more lines). A bit slower but should work.

Also there can be a problem if all the lines match your term. Consider writing results in a temporary file when you find them and then just rename/move the file to the destination folder.

Upvotes: 1

Eric Yeoman
Eric Yeoman

Reputation: 1036

Trying to read the entire file in one go is causing your memory exception. Look into reading the file in stages. The FileStream and BufferedStream classes provide ways of doing this:

https://msdn.microsoft.com/en-us/library/system.io.filestream(v=vs.110).aspx

https://msdn.microsoft.com/en-us/library/system.io.bufferedstream.read(v=vs.110).aspx

Upvotes: 5

Related Questions