Pixelspace
Pixelspace

Reputation: 31

Find and replace data in file (without loading the entire thing)?

I want to replace some data in a file, however I do not know exactly where this 200MB file would contain it. Is it possible to find (and replace them with something else) these values without loading a 200mb+ file into the memory?

Upvotes: 2

Views: 722

Answers (2)

Zverev Evgeniy
Zverev Evgeniy

Reputation: 3712

Searching the file is not a problem. What you need is to work with the FileStream which is available via File.Open method. You can read through the file up to the bytes you need to replace. Problem arises when you need to insert something. The FileStream allows you to overwrite some or all of the file contents from a particular byte forth and to append new content to its end but it does not allow you to insert data in the middle of the file. In order to overcome this problem you are going to need a temporary file. If you agree to that you could do the following:

  1. Open the FileStream on the original file.
  2. Create a temporary file that will hold the draft version.
  3. Search through the original file and copy all "good" data into temporary file up to the point where modifications are to be made.
  4. Insert modified and new data into the temporary file.
  5. Finish up the temporary file with the remaining "good" content from the original file.
  6. Replace the original file with the temporary one.
  7. Delete the temporary file.

You could use the Path.GetTempFileName method for convenient way of utilizing a temporary file.

P.S. If you modify an exe then you probably make replacements on text constants and you neither need to insert new bytes nor to remove any. In such a case you do not need to bother with the temporary file and the FileStream is all you need.

P.P.S. Working with the FileStream you decide on size of a buffer you read from file and write back. Keep in mind that this size is the tradeoff between memory consumption, I/O performance and complexity of your code. Choose wisely. I would make it per-byte for the first time and try to optimize increasing the buffer to say 64k when it works. You can count on the FileStream to buffer data; it is not performing disk I/O each time you request another byte from it. If you dive into buffering yourself then try not to fragment the Large Object Heap. The threshold for .NET 4.5 is 85000 bytes.

Upvotes: 3

Prashant19sep
Prashant19sep

Reputation: 183

Just a thought, how about reading your file line by line or may be in chunk of bytes and see in each chunk if u have the data that needs to be replaced. Also while reading make sure get the file pointer till where you have read the file so that when u find the match then u can go back to that location and over write those exact bytes which u have targetted.

Upvotes: 1

Related Questions