Reputation: 7005
I am finding that "loading" a file into memeory can take very different amounts of time - even if my machine appears not to be doing much else. I have attached some code to illustrate the issue:
The output is below.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;
using System.Runtime.InteropServices;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
LoadFileUnman();
LoadFileUnman();
LoadFileUnman();
LoadFileUnman();
LoadFileUnman();
Console.WriteLine("Done");
}
public unsafe bool LoadFileUnman()
{
string filename = @"C:\DataFile.BNF";
var fileStream = new FileStream(filename,
FileMode.Open,
FileAccess.Read,
FileShare.Read,
16 * 1024,
FileOptions.SequentialScan);
if (fileStream == null)
{
Console.WriteLine( "Could not open file");
return true;
}
Int64 length = fileStream.Length;
Console.WriteLine("File length: " + length.ToString("#,###"));
UnmanagedMemoryStream GlobalMS;
IntPtr GlobalBuffer;
try
{
IntPtr myp = new IntPtr(length);
GlobalBuffer = Marshal.AllocHGlobal(myp);
}
catch (Exception er)
{
Console.WriteLine("Could not allocate memory: " + er.Message);
return true;
}
unsafe
{
byte* pBytes = (byte*)GlobalBuffer.ToPointer();
GlobalMS = new UnmanagedMemoryStream(pBytes, (long)length, (long)length, FileAccess.ReadWrite);
DateTime befDT = DateTime.Now;
fileStream.CopyTo(GlobalMS);
Console.WriteLine("Load took: " + DateTime.Now.Subtract(befDT).TotalMilliseconds.ToString("#,###") + "ms");
GlobalMS.Seek(0, SeekOrigin.Begin);
}
GlobalMS.Close();
fileStream.Close();
return false;
}
}
}
Here is the output, the timings differ even more when I use bigger files (10G). Then sometimes it's a few seconds to load or even a minute.
File length: 178,782,404
Load took: 5,125ms
File length: 178,782,404
Load took: 156ms
File length: 178,782,404
Load took: 172ms
File length: 178,782,404
Load took: 141ms
File length: 178,782,404
Load took: 1,891ms
Can anyone tell me why it is so variable, and if there is anything I could do.
EDIT 1
From the comments I have had - it seems a good idea for me to highlight that what I need is a way to fix the variability of the load NOT the overall speed. I can increase the speed by optimising in vaious ways (and I have) but it is the difference in consequetive load times that is the issue.
EDIT 2
Here are services that I am running. I would be grateful if anyone noticed any that might cause me problems.
Upvotes: 3
Views: 2285
Reputation: 1
The windows system is doing things behind the scenes, which makes it 'impossible' to control or test what is really happening. The windows system is having its own layer of buffering on top of everything else. A filestream flush does not flush the data to disk, but rather to the win system that does what it wants and when it wants.
See the resource monitor that can be started from the task manager, then you might see a system process reading and writing to the same file as your application is.
-All I want is the best sequential read and write speeds of large files, but thanks to smart system like this along with 'excellent' ms documentation I am really stuck. Guess I'll do same as everyone else, -whatever works... Sad thing
Upvotes: 0
Reputation: 33139
It depends on many factors, such as what else your PC is doing at the time, the fragmentation of the disk, whether memory is (almost) full, etc.
There really isn't much you can do except optimize your environment:
If the files you read are copies, then you can read them from a RAM disk -- so you may have a background process that copies the files into a RAM disk, and then your program can read them from there. That is also significantly faster than reading from disk.
See also http://www.softperfect.com/products/ramdisk/ for RAM disk software.
EDIT: From your image I notice the following, which may impact performance (note this list is non-exhaustive, so there may be other services that I didn't notice that cause delays):
Upvotes: 3
Reputation: 12316
Things to consider:
It would be interesting to see the results if you ran that more than 5 times.
Some additional info:
A IO-bound process waiting for disk will be boosted in priority so it is able to handle the data immediately. Most OS do this as part of their scheduler architecture. This means that usually a moderately busy system should not have a big impact on the process running ... unless they share some slow device. Disk is a slow device, but its easy to forget that memory is a relatively slow device too and should be shared with care.
For paralellism (assuming you are writing server software): My MSSQL-server has DB/log spread over effectively 28 disks and the server contains several cards with several CPU's all with separate bus access to separate memory, plus some cross connections. MSSQL utilizes this to allocate parts of the DB to memory corresponding to the closest CPU. Searches are done in parallel on all CPUs+their closes memory (see NUMA. My point being that there is hardware designed specifically to boost similar scenarios.
Upvotes: 1
Reputation: 7304
The first time you instantiate the buffer, the OS searches for free memory. For a 10G file it's clear that the space must be found on disk, thus the huge delay. Once you redo the task again, the memory is still available before it is reclaimed.
Probably you may verify this by placing a GC.Collect() after each LoadFileUnman(), within the button handler.
Upvotes: 0