Reputation: 183
My application needs to open a lot of small files, say 1440 files each containing data of 1 minute to read all the data of a certain day. Each file is only a couple of kB big. This is for a GUI application, so I want the user (== me!) to not have to wait too long.
It turns out that opening the files is rather slow. After researching, most time is wasted in creating a FileStream (OpenStream = new FileStream) for each file. Example code :
// stream en reader aanmaken
FileStream OpenStream;
BinaryReader bReader;
foreach (string file in files)
{
// bestaat de file? dan inlezen en opslaan
if (System.IO.File.Exists(file))
{
long Start = sw.ElapsedMilliseconds;
// file read only openen, anders kan de applicatie crashen
OpenStream = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
Tijden.Add(sw.ElapsedMilliseconds - Start);
bReader = new BinaryReader(OpenStream);
// alles in één keer inlezen, werkt goed en snel
// -bijhouden of appenden nog wel mogelijk is, zonodig niet meer appenden
blAppend &= Bestanden.Add(file, bReader.ReadBytes((int)OpenStream.Length), blAppend);
// file sluiten
bReader.Close();
}
}
Using the stopwatch timer, I see that most (> 80%) of the time is spent on creating the FileStream for each file. Creating the BinaryReader and actually reading the file (Bestanden.add) takes almost no time.
I'm baffled about this and cannot find a way to speed it up. What can I do to speed up the creation of the FileStream?
update to the question:
Upvotes: 11
Views: 3126
Reputation: 29052
Disclaimer: this answer is just a (founded) speculation that it's rather a Windows bug than something you can fix with different code.
So this behaviour might relate to the Windows bug described here: "24-core CPU and I can’t move my mouse".
These processes were all releasing the lock from within NtGdiCloseProcess.
So if FileStream
uses and holds such a critical lock in the OS, it would wait a few µSecs for every file which would add up for thousands of files. It may be a different lock, but the above mentioned bug at least adds the possibility of a similar problem.
To prove or disprove this hypothesis some deep knowledge about the inner workings of the kernel would be necessary.
Upvotes: 0
Reputation: 1915
As you have mentioned in the comment to the question FileStream
reads first 4K to buffer by creating the object. You can change the size of this buffer to reflect better size of your data. (Decrease if your files are smaller than the buffer, for example). If you read file sequentially, you can give OS the hint about this through FileOptions
. In addition, you can avoid BinaryReader
, because you read files entirely.
// stream en reader aanmaken
FileStream OpenStream;
foreach (string file in files)
{
// bestaat de file? dan inlezen en opslaan
if (System.IO.File.Exists(file))
{
long Start = sw.ElapsedMilliseconds;
// file read only openen, anders kan de applicatie crashen
OpenStream = new FileStream(
file,
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite,
bufferSize: 2048, //2K for example
options: FileOptions.SequentialScan);
Tijden.Add(sw.ElapsedMilliseconds - Start);
var bufferLenght = (int)OpenStream.Length;
var buffer = new byte[bufferLenght];
OpenStream.Read(buffer, 0, bufferLenght);
// alles in één keer inlezen, werkt goed en snel
// -bijhouden of appenden nog wel mogelijk is, zonodig niet meer appenden
blAppend &= Bestanden.Add(file, buffer, blAppend);
}
}
I do not know type of Bestanden
object. But if this object has methods for reading from array you can also reuse buffer for files.
//the buffer should be bigger than the biggest file to read
var bufferLenght = 8192;
var buffer = new byte[bufferLenght];
foreach (string file in files)
{
//skip
...
var fileLenght = (int)OpenStream.Length;
OpenStream.Read(buffer, 0, fileLenght);
blAppend &= Bestanden.Add(file, /*read bytes from buffer */, blAppend);
I hope it helps.
Upvotes: 2