w128
w128

Reputation: 4928

Combining FileStream and MemoryStream to avoid disk accesses/paging while receiving gigabytes of data?

I'm receiving a file as a stream of byte[] data packets (total size isn't known in advance) that I need to store somewhere before processing it immediately after it's been received (I can't do the processing on the fly). Total received file size can vary from as small as 10 KB to over 4 GB.

I did some testing and most of the time, there seems to be little performance difference between say 10 000 consecutive calls of MemoryStream.Write() vs FileStream.Write(), but a lot seems to depend on buffer size and the total amount of data in question (i.e the number of writes). Obviously, MemoryStream size reallocation is also a factor.

  1. Does it make sense to use a combination of MemoryStream and FileStream, i.e. write to memory stream by default, but once the total amount of data received is over e.g. 500 MB, write it to FileStream; then, read in chunks from both streams for processing the received data (first process 500 MB from the MemoryStream, dispose it, then read from FileStream)?

  2. Another solution is to use a custom memory stream implementation that doesn't require continuous address space for internal array allocation (i.e. a linked list of memory streams); this way, at least on 64-bit environments, out of memory exceptions should no longer be an issue. Con: extra work, more room for mistakes.

So how do FileStream vs MemoryStream read/writes behave in terms of disk access and memory caching, i.e. data size/performance balance. I would expect that as long as enough RAM is available, FileStream would internally read/write from memory (cache) anyway, and virtual memory would take care of the rest. But I don't know how often FileStream will explicitly access a disk when being written to.

Any help would be appreciated.

Upvotes: 4

Views: 5096

Answers (3)

Hans Passant
Hans Passant

Reputation: 941525

No, trying to optimize this doesn't make any sense. Windows itself already caches file writes, they are buffered by the file system cache. So your test is about accurate, both MemoryStream.Write() and FileStream.Write() actually write to RAM and have no significant perf differences. The file system driver lazily writes it to disk in the background.

The RAM used for the file system cache is what's left over after processes claimed their RAM needs. By using a MemoryStream, you reduce the effectiveness of the file system cache. Or in other words, you trade one for the other without benefit. You're in fact off worse, you use double the amount of RAM.

Don't help, this is already heavily optimized inside the operating system.

Upvotes: 5

Jim Mischel
Jim Mischel

Reputation: 133995

Use a FileStream constructor that allows you to define the buffer size. For example:

using (outputFile = new FileStream("filename", 
    FileMode.Create, FileAccess.Write, FileShare.None, 65536))
{
}

The default buffer size is 4K. Using a 64K buffer reduces the number of calls to the file system. A larger buffer will reduce the number of writes, but each write starts to take longer. Emperical data (many years of working with this stuff) indicates that 64K is a very good choice.

As somebody else pointed out, the file system will likely do further caching, and do the actual disk write in the background. It's highly unlikely that you'll receive data faster than you can write it to a FileStream.

Upvotes: 3

Tim S.
Tim S.

Reputation: 56536

Since recent versions of Windows enable write caching by default, I'd say you could simply use FileStream and let Windows manage when or if anything actually is written to the physical hard drive.

If these files don't stick around after you've received them, you should probably write the files to a temp directory and delete them when you're done with them.

Upvotes: 3

Related Questions