Reputation: 123
I wanted to release my application because it was working well. But when I was running it as a packaged application, it was using an tremendous amount of memory up to 20 GByte and more. Running it in Visual Studio 2019, I can't reproduce the problem. It is also running very long, where in VS even with many files it's a few seconds.
I tried I think almost everything so far, removed all the parallelism, ConcurrentQueue to Queue etc. But the problem persists.
https://github.com/rmoergeli/ConsoleAppConcurrentMemoryProblem
VS and executable, you have to "Run as administrator".
Upvotes: 0
Views: 149
Reputation: 131180
It looks like you're trying to process all files on a drive in parallel, and probably end up caching every file in memory, because the worker methods end up blocking each other. Running under a debugger is too slow for this to happen.
.NET already provides mechanisms for processing individual messages in a pipeline of steps with multiple threads, and constraining how many items can be in memory at a time.
It's quite likely all you need is to use a single ActionBlock or TransformBlock class. Those classes accept messages in their input buffers and process them in-order using one or more worker tasks. They also allow setting a boundary to their input buffers to avoid buffer overflows, if the workers are too slow.
To solve your problem, perhaps all you need is an ActionBlock with a DOP>1 and a limited input buffer, eg :
var options=new new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 5,
BoundedCapacity=10
});
var block=new ActionBlock<string>(filePath=>{
//Do something with that file
},options);
//Feed all files to the block
foreach(var file in Directory.EnumerateFiles("C:\\","*",SearchOptions.AllDirectories))
{
await block.SendAsync(file);
}
block.Complete();
await block.Completion;
This will feed all files to the block, waiting if there are more than 10 items waiting in the input buffer. Each item will be processed on a separate task. At most 5 items will be processed concurrently. In the end, it will await for the block to complete processing all buffered messages before exiting.
The C:\
folder contains inaccessible folders that would raise an exception if you tried to read them. To avoid this, you can use EnumerationOptions instead of SearchOptions
:
var enumOptions=new EnumerationOptions {
IgnoreInaccessible=true,
RecurseSubdirectories=true
};
foreach(var file in Directory.EnumerateFiles("C:\\","*",options))
{
await block.SendAsync(file);
}
Problems with the current code
The code is overcomplicated and uses multiple concurrency and parallelism constructs in inappropriate or even conflicting ways. It looks like it's trying to process all the files in a local disk but ends ups loading every file (or at least every file name or FileInfo object) in memory, waiting for some blocked methods to process them. Most likely, running this in Visual Studio is too slow to exhibit this behaviour.
For starters, BackgroundWorker is obsolete, completely replaced since 2012 by Task and the Progress class for progress reporting. That's 8 years. There's no valid reason to use it any more.
A Task is not a thread, it's just a job (task) that will run on a thread at some point. It's not meant to stay alive for a long time and definitely not meant to act as a thread with a loop.
PLINQ and the methods of the Parallel
class are meant for parallelism - CPU-heavy processing of lots of in-memory data by using all available cores. They aren't meant for concurrency, where different operations need to happen at the same time, especially not IO scenarios that don't need the CPU.
Adding semaphores only makes things work by getting all those constructs to block each other.
Upvotes: 1