Reputation: 20468
I am really really confused about Parallel.ForEach... How does it work?
The code below has an error -> File Is In Use
Parallel.ForEach(list_lines_acc, (line_acc, list_lines_acc_state) =>
{
FileStream file =
new FileStream(GPLfilePath, FileMode.Open, FileAccess.ReadWrite);
StreamReader reader = new StreamReader(file);
var processed = string.Empty;
Ok_ip_port = string.Empty;
while (reader.EndOfStream)
{
if (string.IsNullOrEmpty(Ok_ip_port))
{
Ok_ip_port = reader.ReadLine();
}
else
{
string currentLine = reader.ReadLine();
processed += currentLine + Environment.NewLine;
}
}
StreamWriter writer = new StreamWriter(file);
writer.Write(processed);
reader.Close();
writer.Close();
file.Close();
});
Would you please show me how can I fix that? This code is just an example.
I want to work with string arrays & Lists inside Parallel.ForEach, but there is always a problem for adding or editing those collections. Can you please provide an example? I am using Visual Studio 2010 + .NET Framework 4.0
Upvotes: 2
Views: 12758
Reputation: 5160
Use a lock
object around the problematic code.... Execution will wait for the lock to be released and you will never have multiple thread accessing the ressource.... parallel ForEach won't add performance in this case . Here is a simple example:
private Object fileLock = new Object();
private void WriteLog(string line)
{
lock (fileLock)
{
string strNomLog = @".\MyFile.log";
System.IO.File.AppendAllText(strNomLog, line);
}
}
Upvotes: 1
Reputation: 564821
In your code, as written, each thread is using the same file, and effectively trying to append to it. Even if this could work, you would have a bad race condition (as the threads would be trying to append to the same file simultaneously).
The error you're seeing is purely because you're using the same file in each loop iteration, so when you try to open the file (after the first iteration), it's erroring out as it's opened by a different loop iteration.
Also, you're never using your loop variable (line_acc
), so there is really no need for a loop here at all. This could be written without the Parallel.ForEach
, and you have the same result, with no issues.
That being said - if this is example code, you'll tend to find that loops that are bound purely by file I/O will tend to not parallelize well. The actual drive being used will become the limiting factor, so running code that purely reads and writes to a file in parallel will often cause the resulting code to run slower, not faster, than running it sequentially.
I want to work with string arrays & Lists inside Parallel.ForEach, but there is always a problem for adding or editing those collections
The code you're showing "as an example" is doing none of this, so it's difficult to see where your issue might be occurring. You can write to an array or List<T>
by index, but you can't add to a list in a parallel loop without extra synchronization (such as a lock
), as List<T>
is not thread safe for writes. If you are trying to read and write from collections, you might consider looking at the System.Collections.Concurrent
namespace, which contains thread safe collections you can safely use in Parallel.ForEach
loops.
Upvotes: 7
Reputation: 35869
To get rid of a file in use error (assuming that it's in use because another thread is writing to it) you have to synchronize access to the file. This generally means that each parallel execution is waiting on other executions to finish writing and defeats the purpose of running in parallel.
Upvotes: 0
Reputation: 12375
As noted in this question:
You are not synchronizing access to index and that means you have a race on it. That's why you have the error. For illustrative purposes, you can avoid the race and keep this particular design by using Interlocked.Increment.
private static void Func<T>(IEnumerable<T> docs)
{
int index = -1;
Parallel.ForEach(
docs, doc =>
{
int nextIndex = Interlocked.Increment(index);
CreateFolderAndCopyFile(nextIndex);
}
);
}
However, as others suggest, the alternative overload of ForEach that provides a loop index is clearly a cleaner solution to this particular problem.
But when you get it working you will find that copying files is IO bound rather than processor bound and I predict that the parallel code will be slower than the serial code.
Upvotes: 2